External Tables in Apache Hive

When the data isn’t transient and you don’t want Hive to manage the table’s and data’s life cycle, you’ll consider creating external tables, as discussed in the previous article. As stated, Hive does not own the data; instead, a table would be built on top of the data that is stored elsewhere. The DROP TABLE statement would only remove the table from Hive, but the data will remain in HDFS.

Let’s do some exercises.

Consider the below data-set as an example.

Filename: book.csv
BookID, BookName
2124, Don Quixote. By Miguel de Cervantes.
2134, Lord of the Rings. By J.R.R. Tolkien.
2135, Harry Potter and the Sorcerer’s Stone. By J.K. Rowling.
2136, And Then There Were None. By Agatha Christie.
2138, Alice’s Adventures in Wonderland. By Lewis Carroll.
2139, The Lion, the Witch, and the Wardrobe. By C.S. Lewis.
2141, Pinocchio. By Carlo Collodi.
2147, Catcher in the Rye.
2148, Don Quixote. By Miguel de Cervantes.

The file is copied on the Desktop of my Cloudera’s VM. 
The following command is used in the terminal/shell to create a directory in the HDFS.

hadoop fs -mkdir /user/cloudera/booksfolder


Now, copying the file from the Desktop to newly created ‘booksfolder’ by using the below command.

hadoop fs -put Desktop/book.csv /user/cloudera/booksfolder


Let’s check if the file is copied in the desired folder.

hadoop fs -ls /user/cloudera/booksfolder
Found 1 items
-rw-r–r– 1 cloudera cloudera 430 2021-05-08 12:30 /user/cloudera/booksfolder/book.csv

HDFS_CreateDataSource

Now, let’s create an external table in Hive.

CREATE EXTERNAL TABLE Books(
BookID INT,
BookName STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
STORED AS TEXTFILE
LOCATION ‘/user/cloudera/booksfolder/’;

HiveExternalTable

Let’s examine the extended properties.
DESC EXTENDED books;

DescExtendedTable

If you are observed, we didn’t use any INSERT or LOAD command to load the data. We just created a table and instructed Hive where to get the data by specifying it’s location. We just need to query the table to view the data now.

SELECT * FROM books;

HiveSelectExternalTableData

Now, what’s the big thing? If you DROP the table “books” from Hive, it will delete the table. The source data will still remain in ‘/user/cloudera/booksfolder/’.

Try it out.

Hope you find this article helpful.

Please click on Follow button to receive notifications on latest posts.



3 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s