A Common Mistake In Defining File Format In Hive

Often the written code is parsed, often an error is not returned from the table generated with file formats, however it is obvious that even though it is successfully interpreted and implemented, the goal would not be accomplished. 

The following is an example of a parsed table code that created the table successfully, while it is not in the desired file format.

CREATE TABLE Employee_Parquet
AS
SELECT empno FROM Emp
AS PARQUET;

Once you execute this query, it will create the table, however when you look at the extended properties of the table, you’ll see the difference.

inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,

The above table is created in TextInputFormat that means a text file format. The right approach to create the parquet or any other file formatted table from the existing is given below:

CREATE TABLE Employee_Parquet2 
STORED AS PARQUET
AS SELECT empno FROM Emp;

This will execute successfully and creates the desired file format. Let’s check the extended properties of the table.

inputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat,

Hope you like this article.

Please click on the follow button to receive the notification on latest updates.

2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s