Working with variables in Apache Hive

The purpose of this tutorial is to demonstrate how to work with variables in Apache Hive. Because Apache Hive does not function in the same manner that SQL does, there are significant differences between the two. One of them is the ability to work with variables in a flexible manner. Let’s explore what the least potential is for us to meet our needs.

Below example will demonstrate how to handle variable within Apache Hive.

SELECT * FROM emp;

EmpDataset

For practice, I’m thinking of using the popular “emp” dataset. The outcome of the above command is well-known; however, I am posting it here for the benefit of newcomers.

Now, I am setting the variable value. The property hiveconf is used for variable substitution as shown in the below example.

SET hiveconf:empno = 7839;

Our variable “empno” has been declared with the above statement and a value has been assigned to it. Let’s use it in the query as we do in SQL.

SELECT * FROM emp WHERE empno = ‘${hiveconf:empno}’;

This will return the row where empno is 7839 as shown in the below screenshot.

UsingVariableHive_1

In actuality, this isn’t helping us at all, is it? Giving a value to a parameter and then utilizing it in the query. One would believe that instead of implementing in this manner, we might simply use the value directly in the SELECT query. Before you say this, remember that we just learnt how to define a parameter and assign a value to it for the time being. If you want to pass the variable value during the run-time, then you should go for execution of the SQL file either using interactive or non-interactive mode.

Let’s create a script file which we usually do for batch processing.

USE dbTest;
SELECT * FROM emp WHERE empno = ‘${hiveconf:empno}’;

ScriptFile

The script file is placed in the home/cloudera, the default location of the executable files. Now execute the below command in HDFS terminal.

hive -hiveconf empno=7839 -f scriptvar.sql

You will get result as shown in the below screenshot. You can reuse the same command by changing the variable value.

hive -hiveconf empno=7782 -f scriptvar.sql

UsingVariableHive_2

Please note that the above implementation is in a non-interactive mode where you will end up in HDFS terminal only. However, if you want it in “interactive-mode” then use the below command.

hive -hiveconf empno=7782 -i scriptvar.sql

Hope you find this article helpful.

Please subscribe for more interesting updates.

 

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s