Complex Data Types – Part-5

Previous posts explained how to deal with Complex Data Types Array, Map, Struct individually. In this post, we will learn how to handle them when they are all in a single table. Look at the dataset below if you have any doubts that it won’t be in a single table.

Employee Name           : John Smith
Reporting To                  : James Todd
Team Members             : Frank Muller
                                            Melanie Pike
Salary                               : 30000
Allowances
         Housing Allowances (%)    : 5.0
         Travelling Allowances (%) : 2.0
         Daily Allowances (%)          : 1.0
Address
        Street                              : 2240
        Area                                 : Mesan Blvd
        State                                : CA
        ZipCode                          : 9025

Did you notice that all complex datatypes can be stored in a single table? There is no doubt that such information is obvious.

Now, let’s get back to the data.

When you look at the data, you’ll see that the “Team Members” column has two values of the same data type. This denotes an array. The “key” and “value” combinations in the “Allowances” column indicate that the column is of the MAP data type. “Address” column contains both integer and string values, i.e. Struct data type.

The below are the specifics from the CSV file I’ll be using: Please click on the thumbnail to enlarge it. The data is entirely false and has nothing to do with any living or deceased human. Defined addresses are also fictitious.

AllComplexDataTypes_Dataset

Pay close attention to the delimiters in the dataset above.

Lines are terminated by “new line,” columns are terminated by “commas,” key-value pairs are terminated by “pipe symbols,” and key and values are separated by “@.” This is important to understand when using Apache Hive since it decides how the data will be loaded.

Let’s do the exercise now.

CREATE TABLE EmployeeRecords
(   Ename String,
    ReportingTo String,
    Teammates Array<String>,
    Salary Float,
    Allowances MAP<String, Float>,
    Address STRUCT<Street:Int, Area:String, State:String, ZipCode:Int>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
COLLECTION ITEMS TERMINATED BY ‘|’
MAP KEYS TERMINATED BY ‘@’
LINES TERMINATED BY ‘\n’;

LOAD DATA LOCAL INPATH ‘Desktop/AllComplexDataTypes_Dataset.csv’
INTO TABLE EmployeeRecords;

AllComplexDataTypes_Impl

Now, verify the data.
SELECT * FROM EmployeeRecords

AllComplexDataTypes_Impl-1

Let’s query against the table.

SELECT Ename, Salary,Allowances[“HouseAlw”],Allowances[“DailyAlw”], Address.state
FROM EmployeeRecords;

SELECT Ename, Teammates[0] FROM EmployeeRecords;

SELECT Ename, Teammates[0] ,Allowances[“HouseAlw”], Address.area
FROM EmployeeRecords
WHERE Address.State = ‘CA’;

AllComplexDataTypes_Impl-2

Hope you liked this post.

Please subscribe for latest updates.

 

 

2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s