Substring Function in Apache Pig

String functions are used to perform an operation on an input string and return the result. Almost major RDBMS systems and big data technologies, such as Apache Hive, Apache Impala and Apache Pig, have a variety of built-in string functions. The names may change somewhat, but the functionality is the same. For example, the SUBSTR function in Oracle SQL*Plus, MySQL, and Apache Hive performs the same purpose as the SUBSTRING function in Microsoft SQL Server and Apache Pig.

This post addresses the SUBSTRING function that help to extract part of a string.

Syntax:
SUBSTRING(string, startIndex, stopIndex)
Returns a substring from a given string.

Example:
grunt> emp = LOAD ‘Desktop/Docs/emp.csv’ USING PigStorage(‘,’) as (empno:int,ename:chararray,job:chararray,mgr:int,hiredate:chararray,sal:double,comm:double,deptno:int);

grunt> emppart = FOREACH emp GENERATE empno, SUBSTRING (ename, 0, 3);

grunt> dump emppart;

Substring_in_Apache_Pig_1Substring_in_Apache_Pig_2

Hope you find this article helpful.

Please subscribe for more interesting updates.

One comment

Leave a Reply