Arithmetic Operations in Apache Pig

Addition, subtraction, multiplication, and division are the four fundamental arithmetic operations. The arithmetic operators such as “+” (Plus), “” (Minus), “*” (Subtraction), “/” (Division), and “%” (Percent) perform operations on integers. An operator performs an action on one or more operands. In this post, we will look at how to use these operators in Apache Pig.

Sample data is provided in the earlier posts.

The following is the sample data, and command to load the data into Pig Storage.

1) Sample Data.
    File Name: emp.csv
7839, KING, PRESIDENT, 0,17/Nov/1981, 5000, 0, 10
7698, BLAKE, MANAGER, 7839, 01/May/1981, 2850, 0, 30
7782, CLARK, MANAGER, 7839, 06/Sep/1981, 2450, 0, 10
7566, JONES, MANAGER, 7839, 04/Feb/1981, 2975, 0, 20
7788, SCOTT, ANALYST, 7566, 13/Jul/87, 3000, 0, 20
7902, FORD, ANALYST, 7566, 03/Dec/1981, 3000, 0, 20
7369, SMITH, CLERK, 7902, 17/Dec/1980, 800, 0, 20
7499, ALLEN, SALESMAN, 7698, 20/Feb/1981, 1600, 300, 30
7521, WARD, SALESMAN, 7698, 22/Feb/1981, 1250, 500, 30
7654, MARTIN, SALESMAN, 7698, 28/Sep/1981, 1250, 1400, 30
7844, TURNER, SALESMAN, 7698, 09/Aug/1981, 1500, 0, 30
7876, ADAMS, CLERK, 7788, 13/Jul/87, 1100, 0, 20
7900, JAMES, CLERK, 7698, 03/Dec/1981, 950, 0, 30
7934, MILLER, CLERK, 7782, 23/Jan/1982, 1300, 0, 10

2) Loading the data into a relation.
Execute the below in a single line to avoid each line execution.

emp = LOAD ‘Desktop/Docs/emp.csv’ USING PigStorage(‘,’) AS
““( empno:int,
““`ename:chararray,
““`job:chararray,
““`mgr:int,
““`hiredate:chararray,
““`sal:double,
““`comm:double,
““`deptno:int);

Let’s use the Group operator to group the relation “emp” and store the result in the relation “empgroup,” as shown below.

empgroup = GROUP emp BY deptno;

The below command will show how the data has been grouped.
dump empgroup;

Result:
(10,{(7839,KING,PRESIDENT,0,17/Nov/1981,5000.0,0.0,10),(7934,MILLER,CLERK,7782,23/Jan/1982,1300.0,0.0,10),(7782,CLARK,MANAGER,7839,06/Sep/1981,2450.0,0.0,10)})

(20,{(7902,FORD,ANALYST,7566,03/Dec/1981,3000.0,0.0,20),(7788,SCOTT,ANALYST,7566,13/Jul/87,3000.0,0.0,20),(7566,JONES,MANAGER,7839,04/Feb/1981,2975.0,0.0,20),(7369,SMITH,CLERK,7902,17/Dec/1980,800.0,0.0,20),(7876,ADAMS,CLERK,7788,13/Jul/87,1100.0,0.0,20)})
(30,{(7844,TURNER,SALESMAN,7698,09/Aug/1981,1500.0,0.0,30),(7654,MARTIN,SALESMAN,7698,28/Sep/1981,1250.0,1400.0,30),(7521,WARD,SALESMAN,7698,22/Feb/1981,1250.0,500.0,30),(7499,ALLEN,SALESMAN,7698,20/Feb/1981,1600.0,300.0,30),(7698,BLAKE,MANAGER,7839,01/May/1981,2850.0,0.0,30),(7900,JAMES,CLERK,7698,03/Dec/1981,950.0,0.0,30)})

If you look at the data above, you’ll see that it’s organized by “deptno.” Let’s use the arithmetic operator “SUM” to summarize the data.

result = foreach empgroup generate emp.deptno, SUM(emp.sal);

Fetch the data and verify if the results are as expected.

dump result;

SUM_Group_DeptNo

In the same way, averages and other arithmetical operations can be performed.

Hope you find this article helpful.

Please subscribe for more interesting updates.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s