LOADING Tab Delimited DATA using APACHE PIG

The below example will let you know how to load the comma separated values in Apache Pig.

Let’s consider the popular Emp data.

OracleEmpData.csv
7839 KING PRESIDENT 0 1981-11-17 5000 0 10
7698 BLAKE MANAGER 7839 1981-05-01 2850 0 30
7782 CLARK MANAGER 7839 1981-09-06 2450 0 10
7566 JONES MANAGER 7839 1981-02-04 2975 0 20
7788 SCOTT ANALYST 7566 1987-07-13 3000 0 20
7902 FORD ANALYST 7566 1981-12-03 3000 0 20
7369 SMITH CLERK 7902 1980-12-17 800 0 20
7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
7521 WARD SALESMAN 7698 1981-02-22 1250 500 30
7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
7844 TURNER SALESMAN 7698 1981-08-09 1500 0 30
7876 ADAMS CLERK 7788 1987-07-13 1100 0 20
7900 JAMES CLERK 7698 1981-12-03 950 0 30
7934 MILLER CLERK 7782 1982-01-23 1300 0 10

You can simply use “Pig” or “Pig -x mapreduce” command to connect to Pig’s grunt/shell.

Once you connected to the grunt, use the below command.

Emp = LOAD ‘/user/cloudera/OracleEmpData.csv’
USING PigStorage(‘\t’)
AS (
        empid:int,
        ename:chararray,
        vcjob:chararray,
        mgrid:int,
        joindate:datetime,
        salary:float,
        comm:float,
        deptid:int
);

pig_csv_tab_result

The above command will fetch the data from the specified CSV file into “Emp”.

To check the data from “Emp”, use the below command.

grunt> DUMP Emp

Pig_Tab1

Hope you find this article helpful.

Please do click on the follow button get latest updates.

3 comments

    1. I spend a lot of time writing blog posts and frequently forget to express gratitude to my readers and followers. Your feedback is really valuable to me. Thanks a lot.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s