Skip to content

Blog
Apache Hive
Hive SerDe RegEx
Sqoop
SQL Server
Courses
- CCA Data Analyst (CCA159) Exam
- Online Courses
General
Downloads
Contact Us

Best Example For Repeat Function In Apache Hive & Impala

3rd Feb 2021 SHAFI SHAIK Apache Hive, Apache Impala, Cloudera Impala 4 comments

When the new business rules are introduced, it might be necessary to perform any cleanup or transformations on existing data.

The below is an use case.

In order to have an unified employee ID, a company that took over the current company has adopted a new business rule. Current employee IDs will not be updated, however, rather an additional number will be added as a prefix to put all employee IDs into a standardized format.

The employee IDs of the current company are like 789,790,791,etc. And employee IDs of the new company are like 2345, 2346, 2367, 2369, etc. As per the business rule, 10000 needs to be added to the each employee ID to bring the employee IDs into a common format. After this implementation, the employee IDs will become 10789, 10790, 10791, 12345, 12346, 12367 and 12369.

Let’s see how this can be implemented.

Below is my test data.
data_repeat_function

Usage of “Repeat” function:

CREATE TABLE empid(id STRING)
— The “id” column is a string, and this is done on purpose.

LOAD DATA LOCAL INPATH ‘Desktop/EmpID.txt’ INTO TABLE empid;

create_stmt_repeat_function

Now, let’s see how to implement the above said business rule on the data.

SELECT CONCAT(CONCAT(‘1’, ”, REPEAT(‘0’, 4-LENGTH(CAST(id AS STRING)))),”,id)
FROM empid;

repeat_function_hive

In the above implementation, the digit “0” was asked to repeat 4 times, however the length of the string has to be deducted from it. Hope you got it.

However, if this is the requirement, we do have alternatives to implement the above said business rule. Let’s see how it can be done.

— Using the CAST function
SELECT CAST(id AS BIGINT)+10000 FROM empid;

cast_function

— Direct approach when the column data type is INT

summing_empid_prefix

Hope you liked this post.

Please click on the follow button to receive updates on latest posts.

Share this:

Twitter
Facebook

Like Loading...

Related

Adding 10000 to a string in Apache Hive Adding 10000 to a string in Apache Impala Adding prefix to a string in Apache Hive Adding prefix to a string in Apache Impala Cast function in Apache Hive Cast function in Apache Impala Repeat function in Apache Hive Repeat function in Apache Impala

Post navigation

Previous Post: Padding Functions in Apache Hive & Apache Impala

Next Post: Case Conversion in Apache Hive & Apache Impala

4 comments

Pingback: Apache Hive Functions – Big Data and SQL
Pingback: Apache Hive Course Contents – Big Data and SQL
Pingback: Apache Impala Built-in Functions – Big Data and SQL
Pingback: Apache Impala – Step by Step – Big Data and SQL

Leave a Reply Cancel reply

Δ

Search for:

Categories

Archives

Archives

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Email Address:

Join 1,208 other subscribers.

Blog Stats

438,851 hits

Follow Us

LinkedIn
Twitter
Facebook
Tumblr

Apache Hive Course Contents

Apache Hive is built on top of Apache Hadoop, which…

Apache Sqoop Course Contents

This page combines all Sqoop-related topics into a single article…

Hive SerDe – RegEx

Here are several examples with explanations and sample datasets for…

Apache Impala – Step by Step

Throughout the year, “Big Data & SQL” published a number…

HDFS Tutorial

Here are all of the HDFS articles that have been…

Apache Pig Tutorial

This page combines all Apache Pig related posts into a…

Blog at WordPress.com.

Comment
Reblog
Subscribe Subscribed
- Big Data & SQL
- Already have a WordPress.com account? Log in now.

Loading Comments...

Write a Comment...

Email (Required)

Name (Required)

Website

%d