The conversion of delimited data into columns in Apache Hive is discussed in this article. If the data is already loaded as a single column in a table, there is a “SPLIT” function that helps us solve this problem. If the table has not been generated yet you can simply create a table with the FIELDS TERMINATED BY clause in the Apache Hive and load the data directly into the table; the delimited data will be converted to columns automatically.
Let’s assume that the table is already created and the data is loaded as a single column. Let’s look into it with an example. Consider the below data set for practice.
Data
1456, ConvertedData1, SomeOtherData1
1466, ConvertedData2, SomeOtherData2
1476, ConvertedData3, SomeOtherData4
Let’s create the table and load the data using Apache Impala.
— Create table statement
CREATE TABLE splitStringData(dataCol1 STRING);
— Data insertion
INSERT INTO splitStringData VALUES(
(‘1456, ConvertedData1, SomeOtherData1’),
(‘1466, ConvertedData2, SomeOtherData2’),
(‘1476, ConvertedData3, SomeOtherData4’)
Connect to Hive now to convert the delimited data using SPLIT function.
Description:
The split function splits the string around the pattern pat and returns an array of strings. You can also specify regular expressions as patterns.
Syntax:
split(string A, string pattern)
Implementation:
SELECT SPLIT(dataCol1,’\\,’)[0] Col1,
SPLIT(dataCol1,’\\,’)[1] Col2,
SPLIT(dataCol1,’\\,’)[2] Col3
FROM splitStringData;
Similar implementation in other RDBMS platforms –
4 comments