Batch Processing in Impala

Apache Impala is easy to manage for SQL developers because it is also a SQL framework. Apache Impala is the perfect alternative for query and analytical purposes, since it is quicker than Hive. Apache Impala has its own SQL engine and is not, like Hive, dependent on MapReduce.

Apache Hive focuses on long-running batch processing, while we turn to Impala to do an interactive query. However instead of providing one after another query, scripts can be used to perform a sequence of commands concurrently.

Let’s see how to execute SQL files in Impala.
1) Create a .sql file with set of SQL commands.
2) Save it in the local directory as shown below

impala_batch_file
3) Open HDFS terminal and enter the below command.
impala-shell -i quickstart.cloudera:21000 -d dbTest -f /home/cloudera/sample.sql

Impala_batch_processing

This is non-interactive mode which means there isn’t any need of allocating a new pseudo-terminal.

-i refers to the impala-shell interpreter to specify the connection information for that instance of impalad:
quickstart.cloudera:21000 is the host and the port in which impala daemon is running.
-d
 refers to the database to connect
-f refers to file (with it’s location).

The below command is for interactive mode.
source /home/cloudera/sample.sql

Impala_batch_processing2

Hope you find this article helpful.

Stay in touch for more interesting updates.

3 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s