This is one of the articles that required a significant amount of time and effort to complete. Once an idea was formed, I went to wiki and found some relevant data to combine in a way that justified the topic title.
We learned how to embed a complex data type STRUCT into an ARRAY datatype in previous articles. Let’s explore what happens if we put the same complex data types in reverse order, which is what happens if we put an array inside the struct.
As discussed in the previous articles, any complex data type can be the top level type for a column or can itself be an item within another complex type. In this article, we are going to see how a STRUCT can be the top-level type for a column, and also it is an item within an ARRAY.
Take a look at the data below. I collected individual celebrity information from wiki, but I may have made a mistake in doing so, so please do not depend on the information and instead look at the data format. Click on the image to enlarge.
The table structure should be as follows, based on the data:
Field Name | Data Type |
ID | INT |
CelebName | STRING |
Marriages | ARRAY<STRUCT< |
SPOUSE:STRING, | |
Children:ARRAY<STRING> | |
>> |
Below is the format of the input file.
So we know our data and we know what could be the table structure. Let’s proceed to implement it in Apache Hive.
–Creating the table in Hive
CREATE TABLE CelebMarriages
(
id BIGINT,
name STRING,
marriages ARRAY < STRUCT <
spouse: STRING,
children: ARRAY <STRING>
> >
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
COLLECTION ITEMS TERMINATED BY ‘,’
MAP KEYS TERMINATED BY ‘:’
LINES TERMINATED BY ‘\n’;
–Importing the data into the table
LOAD DATA LOCAL INPATH ‘Desktop/Docs/test1.csv’ INTO TABLE CelebMarriages;
–Queries / Data retrieval
SELECT name, marriages.spouse[0], marriages.children[0] FROM CelebMarriages;
SELECT name, marriages.spouse[1], marriages.children[1] FROM CelebMarriages;
SELECT name, marriages.spouse[2], marriages.children[2] FROM CelebMarriages;
Hope you find this article helpful.
Please subscribe for more interesting updates.
4 comments