String functions are used to execute an input string operation and return the output string. There are many built-in string functions available on almost all RDBMS platforms and big data tools such as Hive and Impala. Names can differ slightly from each other, but the functionality is the same. For eg, the LEFT and RIGHT functions in SQL Server are similar to the STRLEFT and STRRIGHT functions in Cloudera’s Impala.
This post addresses the string functions that help to extract part of a string.
STRLEFT(STRING a, INT num_chars)
Purpose: Returns the leftmost characters of the string.
The STRLEFT() function extracts a number of characters from a string (starting from left). The return type is ‘STRING’.
Instead of using SUBSTR function with 2 arguments, this is a straight forward function where we can extract the sub-string from a given input.
SELECT STRLEFT(‘BigData & SQL’, 3);
SELECT STRLEFT(ename, 5) FROM empInfo;
STRRIGHT(STRING a, INT num_chars)
Purpose: Returns the rightmost characters of the string.
Similar to the STRLEFT() function, the STRRIGHT() function extracts a number of characters from a
string (starting from right). The return type is ‘STRING’.
SELECT STRRIGHT(‘BigData & SQL’, 3);
SELECT STRRIGHT(ename, 5) FROM empInfo;
The desired output can also be achieved using SUBSTR function which helps retrieving the portion of the string starting from a specified point, optionally with a specified maximum length. The characters in the string are indexed starting at 1.
Note: The below specified function is common in both Apache Hive & Apache Impala.
SUBSTR(STRING a, INT start [, INT len]), SUBSTRING(STRING a, INT start [, INT len])
SELECT SUBSTR(ename, 1, 5) FROM empInfo;
SELECT SUBSTR(‘BigData & SQL’, 11, 3);
The usage of SUBSTR is not limited to the above examples. Look at the below-
SELECT SUBSTR(‘BigData & SQL’, 4)
This will ignore the first 4 characters from the string and returns the remaining characters.