Connecting to Apache Pig

As stated in the previous post, Commands can be run in either interactive using the Grunt console or batch mode by submitting a script. Apache Pig can be connected in either local mode, which uses a local host and file system, or MapReduce mode, which uses a Hadoop cluster and HDFS. There’s an option to select the engine instead of mapreduce.

In this session, we will see how to connect to Apache Pig in different modes.

Local Mode:
pig -x local is the command to connect to the local host. Or you can simply type “pig“.

Tez Local Mode:
pig -x tez_local is the command to connect to use “Tez” engine instead of Mapreduce. However it is experimental. There are some queries which just error out on bigger data in local mode.

Spark Local Mode:
pig -x spark_local is the command to invoke spark runtime engine in local host. Like “Tez local mode” this is also experimental.

Mapreduce Mode:
pig -x mapreduce is the command to invoke mapreduce engine. Since mapreduce is the default engine, you can simply type “pig” in the Hadoop cluster.

Tez Mode:
pig -x tez is the command to invoke “Tez” engine in the Hadoop cluster.

Spark Mode:
pig -x spark is the command to invoke “Spark runtime engine” in the Hadoop cluster.

Note:
To exit from grunt shell, press CTRL+D or just type exit.

Hope you find this article helpful.

Subscribing to this site will allow you to receive quick updates on future articles.