Index of the character occurrences – Apache Pig

INDEXOF is the function that returns the index of the first occurrence of a character in a string, searching forward from a start index. Let’s see how to work with it.

The syntax of the function is given below:
INDEXOF(string, ‘character’, startIndex)

String is the input-string or column which to be searched, and the ‘character’ is the character that is searched for, the index from which to begin the forward search. The string index begins with zero (0).

Let’s do some practice exercises for better understanding.

Sample data:

The below data contains year of the movie released and the movie title. Save this data in a CSV file.
1969_DownhillRacer
1970_M*A*S*H
1970_ThePartyAtKittyAndStud’s
1970_LoversAndOtherStrangers
1970_TheSidelongGlancesOfAPigeonKicker
1970_HerculesInNewYork
1971_Bananas
1971_Klute
1972_What’sUp,Doc?
1973_NoPlaceToHide

Loading data into a relation
rawdata = LOAD ‘Desktop/Docs/movies.csv’ USING PigStorage() as (data:chararray);

Splitting the data using INDEXOF function.

moviedata = FOREACH rawdata GENERATE INDEXOF(data, ‘And’, 1);

Retrieve data.
DUMP moviedata;

Result:
(-1)
(-1)
(20)
(11)
(-1)
(-1)
(-1)
(-1)
(-1)
(-1)

If you look at the output, you’ll see that the “And” keyword appears after the 20th character position in the third row and after the 11th character position in the fourth row.

Hope you find this article helpful.

Please follow us for more interesting updates.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s