Previous posts explained how to deal with Complex Data Types Array, Map, Struct individually. In this post, we will learn how to handle them when they are all in a single table. Look at the dataset below if you have any doubts that it won’t be in a single table.
Employee Name : John Smith
Reporting To : James Todd
Team Members : Frank Muller
Melanie Pike
Salary : 30000
Allowances
Housing Allowances (%) : 5.0
Travelling Allowances (%) : 2.0
Daily Allowances (%) : 1.0
Address
Street : 2240
Area : Mesan Blvd
State : CA
ZipCode : 9025
Did you notice that all complex datatypes can be stored in a single table? There is no doubt that such information is obvious.
Now, let’s get back to the data.
When you look at the data, you’ll see that the “Team Members” column has two values of the same data type. This denotes an array. The “key” and “value” combinations in the “Allowances” column indicate that the column is of the MAP data type. “Address” column contains both integer and string values, i.e. Struct data type.
The below are the specifics from the CSV file I’ll be using: Please click on the thumbnail to enlarge it. The data is entirely false and has nothing to do with any living or deceased human. Defined addresses are also fictitious.
Pay close attention to the delimiters in the dataset above.
Lines are terminated by “new line,” columns are terminated by “commas,” key-value pairs are terminated by “pipe symbols,” and key and values are separated by “@.” This is important to understand when using Apache Hive since it decides how the data will be loaded.
Let’s do the exercise now.
CREATE TABLE EmployeeRecords
( Ename String,
ReportingTo String,
Teammates Array<String>,
Salary Float,
Allowances MAP<String, Float>,
Address STRUCT<Street:Int, Area:String, State:String, ZipCode:Int>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
COLLECTION ITEMS TERMINATED BY ‘|’
MAP KEYS TERMINATED BY ‘@’
LINES TERMINATED BY ‘\n’;
LOAD DATA LOCAL INPATH ‘Desktop/AllComplexDataTypes_Dataset.csv’
INTO TABLE EmployeeRecords;
Now, verify the data.
SELECT * FROM EmployeeRecords
Let’s query against the table.
SELECT Ename, Salary,Allowances[“HouseAlw”],Allowances[“DailyAlw”], Address.state
FROM EmployeeRecords;
SELECT Ename, Teammates[0] FROM EmployeeRecords;
SELECT Ename, Teammates[0] ,Allowances[“HouseAlw”], Address.area
FROM EmployeeRecords
WHERE Address.State = ‘CA’;
Hope you liked this post.
Please subscribe for latest updates.
2 comments