Since Apache Pig does not have SELECT statements like SQL, there is no straightforward solution for loading, filtering, and retrieving particular columns in a single expression. This tutorial will show you how to extract specific columns from a relation using Apache Pig.
It’s a two-step process: load data into a relation, then use the ‘generate’ function to filter the required columns into another relation.
grunt> emp = LOAD ‘Desktop/Docs/emp.csv’ USING PigStorage(‘,’) as (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:double, comm:double, deptno:int);
grunt> empdata = foreach emp generate empno, ename, sal;
grunt> dump empdata;
Hope you find this article helpful.
Please join our mailing list to receive more interesting information.
One comment