Select Specific Columns – Apache Pig

Since Apache Pig does not have SELECT statements like SQL, there is no straightforward solution for loading, filtering, and retrieving particular columns in a single expression. This tutorial will show you how to extract specific columns from a relation using Apache Pig.

It’s a two-step process: load data into a relation, then use the ‘generate’ function to filter the required columns into another relation.

grunt> emp = LOAD ‘Desktop/Docs/emp.csv’ USING PigStorage(‘,’) as (empno:int,  ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:double, comm:double, deptno:int);

grunt> empdata = foreach emp generate empno, ename, sal;

grunt> dump empdata;

SelectSpecificColumnsPig

Hope you find this article helpful.

Please join our mailing list to receive more interesting information.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s