Trimming Functions in Apache Pig

Trimming functions help to remove spaces from a given string. These functions are often used in data cleansing operations during the ETL cycle. This article aims to let you know the trimming functions that are available in Apache Pig.

TRIM(expression)
It returns string with leading and trailing whitespaces removed.

LTRIM(expression)
It returns a string with leading whitespaces removed.

RTRIM(expression)
It returns a copy of a string with trailing whitespaces removed.

Now, let’s do some practice.

Sample Data:

DataForTrimmingDataForTrimming2

Examples:
grunt> trimdata = LOAD ‘Desktop/Docs/trimexample.csv’ USING PigStorage(‘,’) as (colno:int,colname:chararray);

grunt> datatrimleft = FOREACH trimdata GENERATE colno, LTRIM (colname);
grunt> datatrimright = FOREACH trimdata GENERATE colno, RTRIM (colname);
grunt> datatrimboth = FOREACH trimdata GENERATE colno, TRIM (colname);

grunt> dump datatrimleft;
grunt> dump datatrimright;
grunt> dump datatrimboth;

Hope you find this article helpful.

To receive quick updates, just enter your email address and click the “follow” button.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s