Best Example For Repeat Function In Apache Hive & Impala

When the new business rules are introduced, it might be necessary to perform any cleanup or transformations on existing data.

The below is an use case.

In order to have an unified employee ID, a company that took over the current company has adopted a new business rule. Current employee IDs will not be updated, however, rather an additional number will be added as a prefix to put all employee IDs into a standardized format.

The employee IDs of the current company are like 789,790,791,etc. And employee IDs of the new company are like 2345, 2346, 2367, 2369, etc. As per the business rule, 10000 needs to be added to the each employee ID to bring the employee IDs into a common format. After this implementation, the employee IDs will become 10789, 10790, 10791, 12345, 12346, 12367 and 12369.

Let’s see how this can be implemented.

Below is my test data.
data_repeat_function

Usage of “Repeat” function:

CREATE TABLE empid(id STRING)
— The “id” column is a string, and this is done on purpose.

LOAD DATA LOCAL INPATH ‘Desktop/EmpID.txt’ INTO TABLE empid;

create_stmt_repeat_function

Now, let’s see how to implement the above said business rule on the data.

SELECT CONCAT(CONCAT(‘1’, ”, REPEAT(‘0’, 4-LENGTH(CAST(id AS STRING)))),”,id)
FROM empid;

repeat_function_hive

In the above implementation, the digit “0” was asked to repeat 4 times, however the length of the string has to be deducted from it. Hope you got it.

However, if this is the requirement, we do have alternatives to implement the above said business rule. Let’s see how it can be done.

— Using the CAST function
SELECT CAST(id AS BIGINT)+10000 FROM empid;

cast_function

— Direct approach when the column data type is INT

summing_empid_prefix

Hope you liked this post.

Please click on the follow button to receive updates on latest posts.

4 comments

Leave a Reply