View the Data, Schema and Plan in Apache Pig

There are several operators that aid in retrieving the loaded data, as well as the schema specified for it and the execution plan made for it, and so on. All of them will be explained in this article.

Dump: This command displays the results on the screen of the variable/relation where the raw or analyzed data is loaded or saved. It’s comparable to the SQL SELECT statement.

grunt> Dump emp;

Describe: It allows the programmer/analyst to view the schema/definition of the relation /variable.

grunt> describe emp;

Explain: This is another diagnostics function which allows us to review the logical, physical and map-reduce execution plans.

grunt> explain emp;

Illustrate: This operator gives step-by-step execution of statements in Pig commands, , such as how many rows were picked up or processed during filtration, grouping, sorting, and different summarization/aggregation phases. In other words, using the ILLUSTRATE operator, we can see how data is transformed using a series of Pig Latin statements.

You can use ILLUSTRATE to test your programs on tiny datasets and obtain faster results.

grunt> illustrate emp;

Hope you find this article helpful.

Please join our mailing list to receive more interesting information.

One comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s