Convert Delimited Data Into Columns In Hive

The conversion of delimited data into columns in Apache Hive is discussed in this article. If the data is already loaded as a single column in a table, there is a “SPLIT” function that helps us solve this problem. If the table has not been generated yet you can simply create a table with the FIELDS TERMINATED BY clause in the Apache Hive and load the data directly into the table; the delimited data will be converted to columns automatically. 

Let’s assume that the table is already created and the data is loaded as a single column. Let’s look into it with an example. Consider the below data set for practice.

Data
1456, ConvertedData1, SomeOtherData1
1466, ConvertedData2, SomeOtherData2
1476, ConvertedData3, SomeOtherData4

Let’s create the table and load the data using Apache Impala.

— Create table statement

CREATE TABLE splitStringData(dataCol1 STRING);

— Data insertion

INSERT INTO splitStringData VALUES(
(‘1456, ConvertedData1, SomeOtherData1’),
(‘1466, ConvertedData2, SomeOtherData2’),
(‘1476, ConvertedData3, SomeOtherData4’)

Connect to Hive now to convert the delimited data using SPLIT function.

Description:

The split function splits the string around the pattern pat and returns an array of strings. You can also specify regular expressions as patterns.

Syntax:
split(string A, string pattern)

Implementation:
SELECT SPLIT(dataCol1,’\\,’)[0] Col1,
SPLIT(dataCol1,’\\,’)[1] Col2,
SPLIT(dataCol1,’\\,’)[2] Col3
FROM splitStringData;

Hive Split

 

Similar implementation in other RDBMS platforms –

4 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s