In this post, we’ll talk about Apache Pig’s complex types, “Tuples & Bags”. Maps will be discussed in another post.
Pig Latin statements work with relations, and a relation is a bag, which is a collection of tuples, which are an ordered set of fields, and a field is a piece of data.
A Pig relation is similar to a table in a relational database, with the tuples in the bag representing the rows in the table. Pig relations, unlike relational tables, do not require that every tuple include the same number of fields or that the fields in the same position (column) be of the same type.
Tuple, in other words, is nothing more than a record containing a set of columns. The following example will explain how to work with collection of tuples.
Example:
Sample Data.
File Name: emp_tuple.csv
(7839,KING,CHAIRMAN) (5000,300)
(7566,JONES,ANALYST) (3400,200)
The above data is collection of tuples delimited by a space. As stated above, collection of tuples is called a bag. Individual elements in the above data are called “Atoms”.
Loading the data into a relation.
emp = load ‘Desktop/Docs/emp_tuple.csv’ USING PigStorage(‘ ‘) as (empdetails:(empid:int, ename:chararray, job:chararray),income:(salary:int, commission:int));
The above command will help in storing the tuples (empdetails and income) into a relation named “emp”.
Review the below screenshots.
Now, let’s retrieve the data from the relation.
grunt> EmpDetailsRec = foreach emp generate empdetails;
The above will return all the columns from the specified tuple “empdetails”. It will return the following output.
((7839,KING,CHAIRMAN))
((7566,JONES,ANALYST))
grunt> EmpSalaryDetails = foreach emp generate empdetails.ename,empdetails.job,income.salary,income.commission;
In the above example we fetched the specific information using <tuplename>.<columnname>. It will return the below result.
(KING,CHAIRMAN,5000,300)
(JONES,ANALYST,3400,200)
Refer to the below screenshots.
Hope you find this article helpful.
Please do follow this blog for more interesting updates.
One comment