Dividing an ordered partition into groups – Apache Hive & Apache Impala

The NTILE function divides the ordered partition into a specified number of bucket groups and assigns a bucket number to each row in the partition. This function helps in analyzing the data and measuring the tertiles, quartiles, deciles, percentiles and other typical summary statistics. 

Note: This function is available in both Apache Hive and Apache Impala.

Syntax:
NTILE(INTEGER x)

Let’s do some exercises for better understanding.

Consider the below dataset:

Table: tbSalesSummary

Dataset_For_NTILE

SELECT NTILE(4) OVER(ORDER BY YearMonth) Qtr, * FROM tbSalesSummary

The data will be partitioned into 4 equal parts and assigns a bucket number for each partition/chunk. The result of the query is given below:

ntile_1

SELECT Qtr, YearMonth,TotalSales FROM(
SELECT NTILE(4) OVER(ORDER BY YearMonth) Qtr, * FROM tbSalesSummary)x
WHERE Qtr=4;

The above query helps in calculating the SUM of total sales for the fourth quarter of the year.

ntile_1a

If you want to know the total sales value per quarter then use the below command.

SELECT Qtr, MAX(TotalSales) FROM(
SELECT NTILE(4) OVER(ORDER BY YearMonth) Qtr, * FROM tbSalesSummary)x
GROUP BY Qtr;

ntile_2

Hope you liked this article.

Please click on the follow button to receive notification on latest posts.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s