We’ll look at how to group the relation (table) data with Apache Pig in this post. In data analysis, sorting or grouping the data is extremely common.
If you’re acquainted with SQL, you’ll recognize the output of the command below.
SELECT deptno, SUM(sal) FROM emp GROUP BY deptno;
This query gives the total amount paid to each department’s employees.
Let’s put it all together in Apache Pig.
Step1: Loading the data into “emp” relation.
emp = LOAD ‘Desktop/Docs/emp.csv’ USING PigStorage(‘,’) as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:double, comm:double, deptno:int);
Step2: Grouping the data
a = GROUP emp BY deptno;
Step3: Summing up the data.
result = foreach a generate group, SUM(emp.sal);
Step4: Returning the result on screen
Dump result;
Hope you find this article helpful.
Please join our mailing list to receive more interesting information.
One comment