The below example will let you know how to load the comma separated values in Apache Pig.
Let’s consider the popular Emp data.
OracleEmpData.csv
7839,KING,PRESIDENT,0,1981-11-17,5000,0,10
7698,BLAKE,MANAGER,7839,1981-05-01,2850,0,30
7782,CLARK,MANAGER,7839,1981-09-06,2450,0,10
7566,JONES,MANAGER,7839,1981-02-04,2975,0,20
7788,SCOTT,ANALYST,7566,1987-07-13,3000,0,20
7902,FORD,ANALYST,7566,1981-12-03,3000,0,20
7369,SMITH,CLERK,7902,1980-12-17,800,0,20
7499,ALLEN,SALESMAN,7698,1981-02-20,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22,1250,500,30
7654,MARTIN,SALESMAN,7698,1981-09-28,1250,1400,30
7844,TURNER,SALESMAN,7698,1981-08-09,1500,0,30
7876,ADAMS,CLERK,7788,1987-07-13,1100,0,20
7900,JAMES,CLERK,7698,1981-12-03,950,0,30
7934,MILLER,CLERK,7782,1982-01-23,1300,0,10
You can simply use “Pig” or “Pig -x mapreduce” command to connect to Pig’s grunt/shell.
Once you connected to the grunt, use the below command.
Emp = LOAD ‘/user/cloudera/OracleEmpData.csv’
USING PigStorage(‘,’)
AS (
empid:int,
ename:chararray,
vcjob:chararray,
mgrid:int,
joindate:datetime,
salary:float,
comm:float,
deptid:int
);
The above command will fetch the data from the specified CSV file into “Emp”.
To check the data from “Emp”, use the below command.
grunt> DUMP Emp
Hope you find this article helpful.
Please do click on the follow button get latest updates.
One comment