Split equivalent in Impala

Split function splits the data based on the delimiter provided and it is mostly used function in Apache Hive. This function is not available in Impala. However, there is an alternative to it.

Let us first see the usage of the “split” function in Hive.

Below is the patient’s blood pressure variations information.

TableName: PatientsData
Systolic-Diastolic
122/80, 122/83, 130/83, 135/86, 140/95, 147/92

SELECT split(data,’\/’) as split_data from PatientsData;

Result:
split_data
122,80
122,83
130,83
130,83
135,86
140,95
147,92

SELECT split(data,’\/’)[0] AS Systolic, 
               split(data,’\/’)[1] AS Diastolic 
FROM PatientsData;

Result:
Systolic   Diastolic
122          80
122          83
130          83
130          83
135          86
140          95
147          92

Let’s do the same exercise in Impala using “split_part” function.

SELECT split_part(data,’\/’,1) AS Systolic, 
               split_part(data,’\/’,2) AS Diastolic 
FROM PatientsData;

Result:
Systolic   Diastolic
122          80
122          83
130          83
130          83
135          86
140          95
147          92

As per the documentation from Apache, below is the description of the function.

SPLIT_PART(STRING source, STRING delimiter, BIGINT index)

Purpose: Returns the requested indexth part of the input source string split by the delimiter.

If the index is a positive number, returns the indexth part from the left within the source string. If the index is a negative number, returns the indexth part from the right within the source string. If the index is 0, returns an error.

The delimiter can consist of multiple characters, not just a single character.

All matching of the delimiter is done exactly, not using any regular expression patterns.

Return type: STRING

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s