Max vs Greatest in Apache Hive

In this post, we will discuss the similarities and differences between two Apache Hive functions, MAX and GREATEST. Using both functions, we get the largest value out of all the values. They do, however, work in different ways.

The MAX function accepts just one argument, but the GREATEST function requires at least two. The MAX function calls the mapreduce engine, while the GREATEST function does not. If extra columns, in addition to the column given as an input for the MAX function, must be grouped. Outside of the GREATEST function, additional columns will be permitted and do not need to be grouped. MAX function can also be used with OVER clause.

Please use the below dataset for practice.
File: Sales.csv
ProductID,2018,2019,2020,2021
P011,151,100,50,77
P012,122,101,44,59
P043,100,65,62,41
P049,121,88,22,80

The following commands were used to create the table and load the data into it.
CREATE TABLE sales(productid string,y2018 int,y2019 int,y2020 int,y2021 int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’;

LOAD DATA LOCAL INPATH ‘Desktop/Docs/sales.csv’ INTO TABLE sales;

Or you can simply use the popular “emp” data as shown below.

Examples:
SELECT MAX(sal) FROM emp;
This will return the highest salary from the “emp” table.

SELECT GREATEST(sal) FROM emp;
Since the GREATEST function requires at least two parameters, the above will result in an error.

SELECT GREATEST(sal,comm) FROM emp;
The preceding will return the highest values from the supplied two columns.

SELECT MAX(sal,comm) FROM emp;
Since the MAX function only accepts one parameter, the above will result in an error.

Refer to the screenshots for more examples:

This slideshow requires JavaScript.

Hope you find this article helpful.

Happy learning!!

Max vs Greatest in Apache Hive

2 comments

Leave a Reply Cancel reply

Share this:

Related

Leave a Reply Cancel reply