When the data isn’t transient and you don’t want Hive to manage the table’s and data’s life cycle, you’ll consider creating external tables, as discussed in the previous article. As stated, Hive does not own the data; instead, a table would be built on top of the data that is stored elsewhere. The DROP TABLE statement would only remove the table from Hive, but the data will remain in HDFS.
Let’s do some exercises.
Consider the below data-set as an example.
Filename: book.csv
BookID, BookName
2124, Don Quixote. By Miguel de Cervantes.
2134, Lord of the Rings. By J.R.R. Tolkien.
2135, Harry Potter and the Sorcerer’s Stone. By J.K. Rowling.
2136, And Then There Were None. By Agatha Christie.
2138, Alice’s Adventures in Wonderland. By Lewis Carroll.
2139, The Lion, the Witch, and the Wardrobe. By C.S. Lewis.
2141, Pinocchio. By Carlo Collodi.
2147, Catcher in the Rye.
2148, Don Quixote. By Miguel de Cervantes.
The file is copied on the Desktop of my Cloudera’s VM.
The following command is used in the terminal/shell to create a directory in the HDFS.
hadoop fs -mkdir /user/cloudera/booksfolder
Now, copying the file from the Desktop to newly created ‘booksfolder’ by using the below command.
hadoop fs -put Desktop/book.csv /user/cloudera/booksfolder
Let’s check if the file is copied in the desired folder.
hadoop fs -ls /user/cloudera/booksfolder
Found 1 items
-rw-r–r– 1 cloudera cloudera 430 2021-05-08 12:30 /user/cloudera/booksfolder/book.csv
Now, let’s create an external table in Hive.
CREATE EXTERNAL TABLE Books(
BookID INT,
BookName STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
STORED AS TEXTFILE
LOCATION ‘/user/cloudera/booksfolder/’;
Let’s examine the extended properties.
DESC EXTENDED books;
If you are observed, we didn’t use any INSERT or LOAD command to load the data. We just created a table and instructed Hive where to get the data by specifying it’s location. We just need to query the table to view the data now.
SELECT * FROM books;
Now, what’s the big thing? If you DROP the table “books” from Hive, it will delete the table. The source data will still remain in ‘/user/cloudera/booksfolder/’.
Try it out.
Hope you find this article helpful.
Please click on Follow button to receive notifications on latest posts.
3 comments