We frequently look for a certain term in a string. This sort of search is feasible in most programming languages that support wild-card searches. In Apache Pig, there is a method that allows us to search for a keyword in a string.
The functions STARTSWITH and ENDSWITH tests inputs to determine if the first argument ends with the string in the second. We are going to discuss about “STARTSWITH” in this article.
Let’s do some practice exercises for better understanding.
Sample data:
The below data contains year of the movie released and the movie title. Save this data in a CSV file.
1969_DownhillRacer
1970_M*A*S*H
1970_ThePartyAtKittyAndStud’s
1970_LoversAndOtherStrangers
1970_TheSidelongGlancesOfAPigeonKicker
1970_HerculesInNewYork
1971_Bananas
1971_Klute
1972_What’sUp,Doc?
1973_NoPlaceToHide
Loading data into a relation
rawdata = LOAD ‘Desktop/Docs/movies.csv’ USING PigStorage() as (data:chararray);
Splitting the data using STARTSWITH function.
moviedata = FOREACH rawdata GENERATE STARTSWITH(data, ‘No’);
Retrieve data.
DUMP moviedata;
Result:
(false)
(false)
(false)
(false)
(false)
(false)
(false)
(false)
(false)
(true)
If you look at the above data and output, you’ll see that one row out of ten is TRUE because the sought term was located there.
This function is beneficial in real-time circumstances where we are confronted with a significant example or criterion.
Hope you find this article helpful.
Please follow us for more interesting updates.