Apache Hive Aggregate Functions – Part-1

In both traditional relational systems and big data technologies such as Apache Hive and Apache Impala, most of the functions specified in this series are common.

An aggregate function or aggregation function in database management is a function where the values of multiple rows are grouped together to form a single summary value.

The dataset on which we are about to practice the aggregate functions is shown below.

Table: tbSalesData
tbSalesData_dataset_aggr

Let’s begin the exercises.

Function: COUNT
The mostly used function in RDBMS or Big Data platforms is undoubtedly COUNT. Instead of individual rows, we often depend on row count satisfying the criteria specified in the clauses. Those stats are helpful in identifying the trends, patterns, business, etc. Such stats will be published in daily reports, dashboards, weekly or monthly sheets, summary reports, etc.

The COUNT function calculates the number of rows including NULL values.

Syntax:
COUNT(*) or COUNT(expr)
Returns the total number of retrieved rows, including rows containing NULL values.

COUNT(DISTINCT expr[, expr])
Returns the number of rows for which the supplied expression(s) are unique and non-NULL.

Examples:
SELECT COUNT(CustLastName) FROM tbSalesData;
Returns 14 as the result.

SELECT COUNT(DISTINCT City) FROM tbSalesData;
Returns 3 as the result.

Function: SUM
Returns the sum of the elements in the group or the sum of the distinct values of the column in the group. The data type of the return value is DOUBLE.

Syntax:
SUM(col)
Returns the SUM of the values in the specified column, ignoring the null values.

SUM(DISTINCT col)
Returns the SUM of the distinct values in the specified column.

Examples:
SELECT SUM(amount) FROM tbSalesData;
Returns 17755.00 as the result.

SELECT SUM(DISTINCT amount) FROM tbSalesData;
Returns 14905.00 as the result.

Function: AVG
Returns the average of the elements in the group or the average of the distinct values of the column in the group. The return data type is DOUBLE.

Syntax:
AVG(col)
Returns the average of the values in the specified column, ignoring the null values.

AVG(DISTINCT col)
Returns the average of the distinct values in the specified column.

Examples:
SELECT AVG(amount) FROM tbSalesData;
Returns 1365.76 as the result.

SELECT AVG(DISTINCT amount) FROM tbSalesData;
Returns 1355.00 as the result.

Function: MIN
Returns the minimum of the column in the group. The return type is DOUBLE.

Syntax: MIN(col)
Returns the minimum of the specified column in the group.

Example:
SELECT MIN(amount) FROM tbSalesData;
Returns 1150.00 as the result.

Function: MAX
Returns the maximum of the column in the group. The return type is DOUBLE.

Syntax: MAX(col)
Returns the maximum of the specified column in the group.

Example:
SELECT MAX(amount) FROM tbSalesData;
Returns 1575.00 as the result.

6 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s