Complex Data Types – ARRAY within MAP

Please click here for “ARRAY in MAP” 

The data is available in a number of formats, volumes, and types. It may be well-organized, partially-organized, or completely disorganized. Numbers or text may be used. It could be in a relational format, or it could be too large for row-column or tabular databases. Finally, data aids in the understanding and improvement of business and business processes, allowing you to save resources and time.

We’ll discuss “complex data” which is a topic that comes up often in current scenarios. As previously stated, data is derived from a variety of sources and in a variety of formats, one of which is nested values. Not everything fits in rows and columns. Not every data bound to Codd’s rules. Sometimes, if we try to put everything in rows and columns it can cause redundancy too. For example, look at the below data.


The above information pertains to the most famous towns and cities in each Indian state. When you look at the data, you’ll see that the “State” detail is replicated every time. This supposed to be in the following format to avoid redundancy.

The above data is more difficult to process and interpret than simple data, and we can’t store it with traditional systems’ primitive data types. There is no other option but to store the data as a string in a single cell if we depend on RDBMS. If we save the array values in primitive data types as a single cell, it would be difficult to retrieve individual items from the array values. This is where “complex data types” are required.

So, let’s continue learning about complex data types in this post. 

I obtained the below data of famous cities and towns in Indian states, as well as their population. This information is obtained from wiki. We’ll use this data to see how to handle an Array when it’s contained within a MAP complex data type. Also, we’ll see how to summarize, process and group the data while working with nested complex data types. 


Let’s begin the exercise.

Since the data is in rows and columns, let’s store it in a normal table.

–Creating the table with primitive data types to store the source data which is in rows and columns.
CREATE TABLE StatePop(City String, State String, Pop BIGINT)

–Loading the data into the table
LOAD DATA LOCAL INPATH ‘Desktop/Docs/IndianStatesPop.csv’ INTO TABLE StatePop;


–Now, create a table with complex data types.
CREATE TABLE StatePop_Complex(
StateCity STRUCT <State: STRING, City: ARRAY <STRING> >);

–Inserting the values from the source table to newly created table.
INSERT INTO StatePop_Complex
named_struct(‘State’, State, ‘City’, collect_list(City))
FROM StatePop

–The data has been populated. Let’s verify it.
SELECT StateCity.State, StateCity.City FROM StatePop_Complex;


More queries:
SELECT StateCity.State, StateCity.City FROM StatePop_Complex
WHERE StateCity.State = ‘Andhra Pradesh’;


–To retrieve first city from the array
SELECT StateCity.State, StateCity.City[0] FROM StatePop_Complex
WHERE StateCity.State = ‘Andhra Pradesh’;


Hope you find this article helpful.

Please subscribe to receive notifications on latest updates.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s