Limiting Rows in Result – Apache Pig

Data sampling is the best practice to understand the data patterns and trends of large datasets by looking at the smaller portion of the data. In RDBMS systems, the LIMIT operator is used to do this, and the same is true in Apache Pig. Let’s look at how to get N tuples (rows in SQL) from a relation (table in SQL).

Prerequisites:
1) Sample Data.
    File Name: emp.csv
7839, KING, PRESIDENT, 0,17/Nov/1981, 5000, 0, 10
7698, BLAKE, MANAGER, 7839, 01/May/1981, 2850, 0, 30
7782, CLARK, MANAGER, 7839, 06/Sep/1981, 2450, 0, 10
7566, JONES, MANAGER, 7839, 04/Feb/1981, 2975, 0, 20
7788, SCOTT, ANALYST, 7566, 13/Jul/87, 3000, 0, 20
7902, FORD, ANALYST, 7566, 03/Dec/1981, 3000, 0, 20
7369, SMITH, CLERK, 7902, 17/Dec/1980, 800, 0, 20
7499, ALLEN, SALESMAN, 7698, 20/Feb/1981, 1600, 300, 30
7521, WARD, SALESMAN, 7698, 22/Feb/1981, 1250, 500, 30
7654, MARTIN, SALESMAN, 7698, 28/Sep/1981, 1250, 1400, 30
7844, TURNER, SALESMAN, 7698, 09/Aug/1981, 1500, 0, 30
7876, ADAMS, CLERK, 7788, 13/Jul/87, 1100, 0, 20
7900, JAMES, CLERK, 7698, 03/Dec/1981, 950, 0, 30
7934, MILLER, CLERK, 7782, 23/Jan/1982, 1300, 0, 10

2) Loading the data into a relation.
Execute the below in a single line to avoid each line execution.
data = LOAD ‘Desktop/Docs/emp.csv’ USING PigStorage(‘,’) AS
(empno:int,
ename:chararray,
job:chararray,
mgr:int,
hiredate:chararray,
sal:double,
comm:double,
deptno:int);

Example:
data_limit = LIMIT data 4;
Dump data_limit;

       Result:
(7566,JONES,MANAGER,7839,04/Feb/1981,2975.0,0.0,20)
(7698,BLAKE,MANAGER,7839,01/May/1981,2850.0,0.0,30)
(7782,CLARK,MANAGER,7839,06/Sep/1981,2450.0,0.0,10)
(7839,KING,PRESIDENT,0,17/Nov/1981,5000.0,0.0,10)

The above command is equal to: 

“`SELECT * FROM emp LIMIT 4;
“`SELECT TOP 4 * FROM emp;

Hope you find this article helpful.

Please follow us for more interesting updates.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s