Get first and the last row in the window – Apache Impala

The expression values from the first and last rows of the window are returned using the FIRST VALUE and LAST VALUE functions. The same value appears in all of the group’s result rows. If the input expression is NULL, the return value is NULL.

If your table contains null values, the IGNORE NULLS clause can be used to return the window’s first non-null value. The identical value appears in all of the group’s result rows. If the input expression is NULL, the return value is NULL.

Let’s do some practice excercises to understand how it works.

Usage1:
The following will help in finding the highest and lowest salaries from the “emp” table.

SELECT DISTINCT FIRST_VALUE(sal)
OVER (ORDER BY sal DESC
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS “HIGHEST”
FROM emp;

SELECT DISTINCT LAST_VALUE(sal)
OVER (ORDER BY sal DESC
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS “LOWEST”
FROM emp;

Usage2:

The following is a sort of scoreboard where each person has their own set of points. To know where they stand, each row must have a low and high score associated with it.

Dataset:
ID, Name, Score
1011, ‘Scott’, 2100
1012, ‘Peter’, 2220
1013, ‘John’, 2010
1014, ‘George’, 2009
1015, ‘Thomos’, 2500
1016, ‘Veronica’, 2110
1017, ‘Anthony’, 2011

Required Output is-
ID, Name, Score, LowestScore, HighestScore
1011, ‘Scott’, 2100, 2009,2500
1012, ‘Peter’, 2220, 2009,2500
1013, ‘John’, 2010, 2009,2500
1014, ‘George’, 2009, 2009,2500
1015, ‘Thomos’, 2500, 2009,2500
1016, ‘Veronica’, 2110, 2009,2500
1017, ‘Anthony’, 2011, 2009,2500

The following SELECT statement will help in achieving the above specified output.
SELECT id, name, score,
LAST_VALUE(score)
OVER (ORDER BY score DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as LowestScore,
FIRST_VALUE(score)
OVER (ORDER BY score DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as HighestScore
FROM empscores;

Usage3:
The same dataset as before, but this time with NULL values. If any of the rows have a NULL value, the FIRST VALUE expression returns NULL for all of them. So, the below example will let you know how to handle in case of NULLs.

SELECT id, name, score,
LAST_VALUE(score)
OVER (ORDER BY score DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as LowestScore,
FIRST_VALUE(score IGNORE NULLS)
OVER (ORDER BY score DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as HighestScore
FROM empscores;

I hope you found this post to be informative.

Please join our mailing list to receive more interesting information.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s