Loading comma separated data using Apache Pig

The below example will let you know how to load the comma separated values in Apache Pig.

Let’s consider the popular Emp data.

OracleEmpData.csv

7839,KING,PRESIDENT,0,1981-11-17,5000,0,10
7698,BLAKE,MANAGER,7839,1981-05-01,2850,0,30
7782,CLARK,MANAGER,7839,1981-09-06,2450,0,10
7566,JONES,MANAGER,7839,1981-02-04,2975,0,20
7788,SCOTT,ANALYST,7566,1987-07-13,3000,0,20
7902,FORD,ANALYST,7566,1981-12-03,3000,0,20
7369,SMITH,CLERK,7902,1980-12-17,800,0,20
7499,ALLEN,SALESMAN,7698,1981-02-20,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22,1250,500,30
7654,MARTIN,SALESMAN,7698,1981-09-28,1250,1400,30
7844,TURNER,SALESMAN,7698,1981-08-09,1500,0,30
7876,ADAMS,CLERK,7788,1987-07-13,1100,0,20
7900,JAMES,CLERK,7698,1981-12-03,950,0,30
7934,MILLER,CLERK,7782,1982-01-23,1300,0,10

You can simply use “Pig” or “Pig -x mapreduce” command to connect to Pig’s grunt/shell.

Once you connected to the grunt, use the below command.

Emp = LOAD ‘/user/cloudera/OracleEmpData.csv’
USING PigStorage(‘,’)
AS (
        empid:int,
        ename:chararray,
        vcjob:chararray,
        mgrid:int,
        joindate:datetime,
        salary:float,
        comm:float,
        deptid:int
);

Pig_CSV1

The above command will fetch the data from the specified CSV file into “Emp”.

To check the data from “Emp”, use the below command.

grunt> DUMP Emp

pig_csv_comma_result

Hope you find this article helpful.

Please do click on the follow button get latest updates.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s