In this blog, we’ll look at how to manage a complex datatype ARRAY within another complex datatype MAP. The dataset under discussion in this blog already contains complex data type that must be dealt in a more nested manner. This blog has various scenarios and use cases for learning and practicing ‘Complex Datatypes’. The goal of offering multiple examples, however, is to allow people appropriate practice.
The table definition into which the supplied data should be placed is provided below.
+--------------+------------------------------------+
| name | type |
+--------------+------------------------------------+
| Dept | MAP<String, ARRAY<String> |
+--------------+------------------------------------+
Each department’s workers are listed here.
Dept GroupMembers
IT John, Smith, Colin, Steve
Software Stephen, George
Testing Zafar, Danial, Tahir
Let’s see how to Implement it in Hive.
–Creating the table to store the text data.
CREATE TABLE Departments(Dept String, GroupMembers Array<String>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
COLLECTION ITEMS TERMINATED BY ‘,’;
–Loading the data into the intermediate table we created.
LOAD DATA LOCAL INPATH ‘Desktop/Docs/Departments.txt’ INTO TABLE Departments;
We will have the data in a table using the aforementioned implementation. The data will now be converted into nested values as per the structure.
Now, let’s create a table to bring the key (Dept) and value (GroupMembers) columns into MAP datatype.
CREATE TABLE Dept_Complex(Dept MAP<String, ARRAY<String>>);
–A plain insert statement will help in this case.
INSERT INTO Dept_Complex
SELECT MAP(Dept, GroupMembers) FROM Departments;
Now, you can query to fetch the data from the table.
SELECT Dept[‘IT’] FROM dept_complex LIMIT 1;
This returns:
[“John”,”Smith”,”Colin”,”Steve”]
SELECT Dept[‘IT’][0] FROM dept_complex LIMIT 1;
This returns the “John” as output.
Hope you find this article helpful.
Please subscribe to receive notifications on latest updates.
3 comments