Usage of Collect_Set Function in Apache Hive

Collect_Set is one of the aggregate functions that help in return a set of objects with duplicate elements eliminated. This will help in bringing the column values into a single row. All values will be displayed in an array format.

The datasets on which we are about to practice the aggregate function “Collect_Set” are shown below.

Table: tbSalesData

Table: StudentMarks

Let’s begin the exercises:

SELECT COLLECT_SET(city) from tbSalesData;

SELECT COLLECT_SET(id), COLLECT_SET(city) from tbSalesData;

It can be used along with other aggregate functions such as SUM, AVG, MAX, MIN, etc.

SELECT COLLECT_SET(city), SUM(amount) FROM tbSalesData;

           SUM(Subject1)+SUM(Subject2)+SUM(subject3) from StudentMarks
GROUP BY StudentName;

Hope you liked this post.

Please do follow for more interesting updates.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s