The below example will let you know how to load the comma separated values in Apache Pig.
Let’s consider the popular Emp data.
OracleEmpData.csv
7839 KING PRESIDENT 0 1981-11-17 5000 0 10
7698 BLAKE MANAGER 7839 1981-05-01 2850 0 30
7782 CLARK MANAGER 7839 1981-09-06 2450 0 10
7566 JONES MANAGER 7839 1981-02-04 2975 0 20
7788 SCOTT ANALYST 7566 1987-07-13 3000 0 20
7902 FORD ANALYST 7566 1981-12-03 3000 0 20
7369 SMITH CLERK 7902 1980-12-17 800 0 20
7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
7521 WARD SALESMAN 7698 1981-02-22 1250 500 30
7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
7844 TURNER SALESMAN 7698 1981-08-09 1500 0 30
7876 ADAMS CLERK 7788 1987-07-13 1100 0 20
7900 JAMES CLERK 7698 1981-12-03 950 0 30
7934 MILLER CLERK 7782 1982-01-23 1300 0 10
You can simply use “Pig” or “Pig -x mapreduce” command to connect to Pig’s grunt/shell.
Once you connected to the grunt, use the below command.
Emp = LOAD ‘/user/cloudera/OracleEmpData.csv’
USING PigStorage(‘\t’)
AS (
empid:int,
ename:chararray,
vcjob:chararray,
mgrid:int,
joindate:datetime,
salary:float,
comm:float,
deptid:int
);
The above command will fetch the data from the specified CSV file into “Emp”.
To check the data from “Emp”, use the below command.
grunt> DUMP Emp
Hope you find this article helpful.
Please do click on the follow button get latest updates.
Great post. I was checking continuously this blog and I am impressed!
Very useful info particularly the last part 🙂 I care for such information much.
I was looking for this certain info for a very long time.
Thank you and good luck. https://www.turnkeylinux.org/user/1527954
LikeLike
I spend a lot of time writing blog posts and frequently forget to express gratitude to my readers and followers. Your feedback is really valuable to me. Thanks a lot.
LikeLike