Trimming functions help to remove spaces from a given string. These functions are often used in data cleansing operations during the ETL cycle. This article aims to let you know the trimming functions that are available in Apache Pig.
TRIM(expression)
It returns string with leading and trailing whitespaces removed.
LTRIM(expression)
It returns a string with leading whitespaces removed.
RTRIM(expression)
It returns a copy of a string with trailing whitespaces removed.
Now, let’s do some practice.
Sample Data:
Examples:
grunt> trimdata = LOAD ‘Desktop/Docs/trimexample.csv’ USING PigStorage(‘,’) as (colno:int,colname:chararray);
grunt> datatrimleft = FOREACH trimdata GENERATE colno, LTRIM (colname);
grunt> datatrimright = FOREACH trimdata GENERATE colno, RTRIM (colname);
grunt> datatrimboth = FOREACH trimdata GENERATE colno, TRIM (colname);
grunt> dump datatrimleft;
grunt> dump datatrimright;
grunt> dump datatrimboth;
Hope you find this article helpful.
To receive quick updates, just enter your email address and click the “follow” button.
One comment