Internal Tables in Hive – Part-1

As mentioned in the previous post, when the data is temporary or if you want Hive to control the life cycle of the table and data, internal tables will be created. In internal tables, data and metadata are kept within the Hive warehouse by default. Prior to dropping some internal table, one must be careful as it would erase the data along with the metadata.

Let’s do some exercises.

Consider the below data-set as an example.

Filename: book.csv
BookID, BookName
2124, Don Quixote. By Miguel de Cervantes.
2134, Lord of the Rings. By J.R.R. Tolkien.
2135, Harry Potter and the Sorcerer’s Stone. By J.K. Rowling.
2136, And Then There Were None. By Agatha Christie.
2138, Alice’s Adventures in Wonderland. By Lewis Carroll.
2139, The Lion, the Witch, and the Wardrobe. By C.S. Lewis.
2141, Pinocchio. By Carlo Collodi.
2147, Catcher in the Rye.
2148, Don Quixote. By Miguel de Cervantes.

The file is copied on the Desktop, that means Cloudera’s local file system.

CREATE TABLE Books (
BookID INT,
BookName STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
STORED AS TEXTFILE;

Please note that ‘STORED AS TEXTFILE’ is optional because Hive stores the data in TEXT FILE format by default.

Now, load the data into the table.
LOAD DATA LOCAL INPATH ‘Desktop/book.csv’ INTO TABLE books;

Internal Table-1

Now, let’s check where the table’s data is placed. Use Hue file browser to identify the data location. The table is created inside dbTest database which is in Hive warehouse.

The full-path will be “/user/hive/warehouse/dbtest.db/book.csv”

If you click on the file “book.csv”, you will be able to see the data as shown in the above image. This data will be handled by Hive and if you drop the table you will loose this data permanently.

While creating the internal tables, You can decide the data to be placed in HDFS or local file system apart from Hive’s default location.

Click here to learn how it can be done.