Complex Data Types – Part-2

The information below pertains to the top-ranked universities in the United States, the United Kingdom, and Australia. The Times Higher Education collects the data using carefully crafted success indicators that evaluate an institution’s teaching, studying, knowledge sharing, and global outlook.

We are going to consider the below data sets to learn “Complex Data Types” in Apache Hive.

Rank_Data
In this post we focus on “Array”.

As discussed in the previous post, “array” is an ordered collection of elements. The elements in the array must be of the same type.

Image-1
Ranks-1

Image-2

Ranks-2

In “Image-1,” all of the elements in the “Rank” column are “Integer” values, while the University column contains only “String” values. The same goes for “Image-2.”

Let’s see how it can be implemented.

For the values of “Image-1”

CREATE TABLE rank1(Ranks Array<INT> , University Array<STRING>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
COLLECTION ITEMS TERMINATED BY ‘,’;

LOAD DATA LOCAL INPATH ‘Desktop/rank.csv’ INTO TABLE rank1;

SELECT * FROM rank1;



The data is uploaded into the table and can be retrieved using a basic SELECT statement, as you can see. However, since the data isn’t in the regular RDBMS format, which requires information to be stored in each cell of a table in the form of rows and columns, we need to know how the data are saved inside the table.



Ranks or Rank1 is a column name in our table and [1,2,3,4] and [10,11,12,13] are the column values. Each value in each row will be represented with index values. As shown in the image, index[0] value is 1 and index[1] value is 2, and so on. Similarly in “University” column index[0] value is “University of Oxford” and “California Institute of Technology” is represented with index[1].

Now, let’s retrieve the data to see how it works.

SELECT ranks[0], university[0] from rank1;

This will return the first value from the “ranks” column. Similarly, university[0] will return the first value from “university” column.



Let’s do the second exercise (as shown in Image-2) with “Country” column.

CREATE TABLE rank2(Country STRING, Ranks Array<INT> , University Array<STRING>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
COLLECTION ITEMS TERMINATED BY ‘,’;

LOAD DATA LOCAL INPATH ‘Desktop/rank2.csv’ INTO TABLE rank2;

SELECT * FROM rank2;



Similar to the first exercise, the values in this table also represented with indexes.

SELECT country, ranks[0], university[0] from rank2;


Hope you liked this post.

For more updates, please click on the “subscribe” button.

 

5 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s