Usage of Collect_List function in Apache Hive

Collect_List is one of the aggregate functions that help in return a set of objects with duplicate elements. This is similar to COLLECT_SET and will help in bringing the column values into a single row. All values will be displayed in an array format.

The datasets on which we are about to practice the aggregate function “Collect_List” are shown below.

Dataset1:
Table: tbSalesData
tbSalesData_dataset_aggr

Dataset2:
Table: StudentMarks

Let’s begin the exercises:

SELECT COLLECT_LIST(city) from tbSalesData;

collect_list_example1

SELECT COLLECT_LIST(id), COLLECT_LIST(city) from tbSalesData;

collect_list_example2

It can be used along with other aggregate functions such as SUM, AVG, MAX, MIN, etc.

SELECT

           StudentName,
           COLLECT_LIST(Subject1),
           COLLECT_LIST(Subject2),
           COLLECT_LIST(Subject3),
           SUM(Subject1)+SUM(Subject2)+SUM(subject3) FROM StudentMarks
GROUP BY StudentName;

collect_list_example3

SELECT

           COLLECT_LIST(Subject1),
           SUM(Subject1),
COLLECT_LIST(Subject2),

           SUM(Subject2),
COLLECT_LIST(Subject3),
SUM(Subject3),

FROM StudentMarks

collect_list_example4

Hope you liked this post.

Please do follow for more interesting updates.

2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s