The NTILE function divides the ordered partition into a specified number of bucket groups and assigns a bucket number to each row in the partition. This function helps in analyzing the data and measuring the tertiles, quartiles, deciles, percentiles and other typical summary statistics.
Note: This function is available in both Apache Hive and Apache Impala.
Syntax: NTILE(INTEGER x)
Let’s do some exercises for better understanding.
Consider the below dataset:
Table: tbSalesSummary
SELECT NTILE(4) OVER(ORDER BY YearMonth) Qtr, * FROM tbSalesSummary
The data will be partitioned into 4 equal parts and assigns a bucket number for each partition/chunk. The result of the query is given below:
SELECT Qtr, YearMonth,TotalSales FROM(
SELECT NTILE(4) OVER(ORDER BY YearMonth) Qtr, * FROM tbSalesSummary)x
WHERE Qtr=4;
The above query helps in calculating the SUM of total sales for the fourth quarter of the year.
If you want to know the total sales value per quarter then use the below command.
SELECT Qtr, MAX(TotalSales) FROM(
SELECT NTILE(4) OVER(ORDER BY YearMonth) Qtr, * FROM tbSalesSummary)x
GROUP BY Qtr;
Hope you liked this article.
Please click on the follow button to receive notification on latest posts.
One comment