Trimming Functions in Apache Hive & Apache Impala

Trimming functions help to remove spaces from a given string. These functions are often used in data cleansing operations during the ETL cycle. While this is not a new innovation for SQL lovers, there is something that Cloudera’s Impala is offering that everybody would appreciate.

Let’s look into the trimming functions one by one.

LTRIM:
This function removes the spaces (any number of occurrences) from the left side of the given string.

Eg: SELECT LTRIM(‘ SpacesBefore’);



Let’s test how many characters were there before and after trimming.

SELECT LENGTH(‘ SpacesBefore’), LENGTH(LTRIM(‘ SpacesBefore’));
Note: LEN the Short form of LENGTH does not work in Apache Impala.



RTRIM:
It is very much same like LTRIM. Only difference is it removes the spaces from right for the given string.

Eg: SELECT RTRIM(‘SpacesAfter ‘);



There is no function to handle directly if there are spaces before and after of the string except a workaround.

SELECT LTRIM(RTRIM(‘ SpacesBeforeAfter ‘));
This is the workaround the developers often use to remove the spaces before and after the given string. (Update: TRIM function is available in latest RDBMS platforms such as SQL Server, etc.)

Now, Impala is offering a direct function TRIM that does the same job. It removes both leading and trailing spaces from the input string.

SELECT TRIM(‘ SpacesBeforeAfter ‘);



Note: The above specified functions and their behaviour is same in both Apache Hive and Apache Impala.

BTRIM
BTRIM removes all instances of one or more characters from the start and end of a STRING value. By default, removes only spaces. If a non-NULL optional second argument is specified, the function removes all occurrences of characters in that second argument from the beginning and end of the string. This functionality has been added in CDH 5.5.0 / Impala 2.3.0

Examples:
SELECT BTRIM(‘ SpacesBeforeAfter ‘);

SELECT LENGTH(‘ SpacesBeforeAfter ‘), LENGTH( BTRIM(‘ SpacesBeforeAfter ‘));

SELECT LENGTH(‘ SpacesBeforeAfter ‘), LENGTH( BTRIM(‘ SpacesBeforeAfter ‘)), LENGTH(BTRIM(‘ SpacesBeforeAfter ‘));



If you observe carefully, unlike TRIM function, the BTRIM function is removing only one space from leading and trailing from the given string. The benefit of this function is it will remove not only spaces but the characters that were specified in the input. Note that it removes only if the characters that needs to be removed are either in leading position or in trailing. It will not remove anything in between the string.


4 comments

Leave a Reply