Substring Function in Apache Pig

String functions are used to perform an operation on an input string and return the result. Almost major RDBMS systems and big data technologies, such as Apache Hive, Apache Impala and Apache Pig, have a variety of built-in string functions. The names may change somewhat, but the functionality is the same. For example, the SUBSTR function in Oracle SQL*Plus, MySQL, and Apache Hive performs the same purpose as the SUBSTRING function in Microsoft SQL Server and Apache Pig.

This post addresses the SUBSTRING function that help to extract part of a string.

SUBSTRING(string, startIndex, stopIndex)
Returns a substring from a given string.

grunt> emp = LOAD ‘Desktop/Docs/emp.csv’ USING PigStorage(‘,’) as (empno:int,ename:chararray,job:chararray,mgr:int,hiredate:chararray,sal:double,comm:double,deptno:int);

grunt> emppart = FOREACH emp GENERATE empno, SUBSTRING (ename, 0, 3);

grunt> dump emppart;


Hope you find this article helpful.

Please subscribe for more interesting updates.

One comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s