Addition, subtraction, multiplication, and division are the four fundamental arithmetic operations. The arithmetic operators such as “+” (Plus), “–” (Minus), “*” (Subtraction), “/” (Division), and “%” (Percent) perform operations on integers. An operator performs an action on one or more operands. In this post, we will look at how to use these operators in Apache Pig.
Sample data is provided in the earlier posts.
The following is the sample data, and command to load the data into Pig Storage.
1) Sample Data.
File Name: emp.csv
“7839, KING, PRESIDENT, 0,17/Nov/1981, 5000, 0, 10
“7698, BLAKE, MANAGER, 7839, 01/May/1981, 2850, 0, 30
“7782, CLARK, MANAGER, 7839, 06/Sep/1981, 2450, 0, 10
“7566, JONES, MANAGER, 7839, 04/Feb/1981, 2975, 0, 20
“7788, SCOTT, ANALYST, 7566, 13/Jul/87, 3000, 0, 20
“7902, FORD, ANALYST, 7566, 03/Dec/1981, 3000, 0, 20
“7369, SMITH, CLERK, 7902, 17/Dec/1980, 800, 0, 20
“7499, ALLEN, SALESMAN, 7698, 20/Feb/1981, 1600, 300, 30
“7521, WARD, SALESMAN, 7698, 22/Feb/1981, 1250, 500, 30
“7654, MARTIN, SALESMAN, 7698, 28/Sep/1981, 1250, 1400, 30
“7844, TURNER, SALESMAN, 7698, 09/Aug/1981, 1500, 0, 30
“7876, ADAMS, CLERK, 7788, 13/Jul/87, 1100, 0, 20
“7900, JAMES, CLERK, 7698, 03/Dec/1981, 950, 0, 30
“7934, MILLER, CLERK, 7782, 23/Jan/1982, 1300, 0, 10
2) Loading the data into a relation.
Execute the below in a single line to avoid each line execution.
emp = LOAD ‘Desktop/Docs/emp.csv’ USING PigStorage(‘,’) AS
“““( empno:int,
“““`ename:chararray,
“““`job:chararray,
“““`mgr:int,
“““`hiredate:chararray,
“““`sal:double,
“““`comm:double,
“““`deptno:int);
Let’s use the Group operator to group the relation “emp” and store the result in the relation “empgroup,” as shown below.
empgroup = GROUP emp BY deptno;
The below command will show how the data has been grouped.
dump empgroup;
Result:
(10,{(7839,KING,PRESIDENT,0,17/Nov/1981,5000.0,0.0,10),(7934,MILLER,CLERK,7782,23/Jan/1982,1300.0,0.0,10),(7782,CLARK,MANAGER,7839,06/Sep/1981,2450.0,0.0,10)})
(20,{(7902,FORD,ANALYST,7566,03/Dec/1981,3000.0,0.0,20),(7788,SCOTT,ANALYST,7566,13/Jul/87,3000.0,0.0,20),(7566,JONES,MANAGER,7839,04/Feb/1981,2975.0,0.0,20),(7369,SMITH,CLERK,7902,17/Dec/1980,800.0,0.0,20),(7876,ADAMS,CLERK,7788,13/Jul/87,1100.0,0.0,20)})
(30,{(7844,TURNER,SALESMAN,7698,09/Aug/1981,1500.0,0.0,30),(7654,MARTIN,SALESMAN,7698,28/Sep/1981,1250.0,1400.0,30),(7521,WARD,SALESMAN,7698,22/Feb/1981,1250.0,500.0,30),(7499,ALLEN,SALESMAN,7698,20/Feb/1981,1600.0,300.0,30),(7698,BLAKE,MANAGER,7839,01/May/1981,2850.0,0.0,30),(7900,JAMES,CLERK,7698,03/Dec/1981,950.0,0.0,30)})
If you look at the data above, you’ll see that it’s organized by “deptno.” Let’s use the arithmetic operator “SUM” to summarize the data.
result = foreach empgroup generate emp.deptno, SUM(emp.sal);
Fetch the data and verify if the results are as expected.
dump result;
In the same way, averages and other arithmetical operations can be performed.
Hope you find this article helpful.
Please subscribe for more interesting updates.
One comment