When we create an external table in Hive, we typically specify its location rather than the actual file. Hive reads data from the specified location, so there is no need for an insert statement. That is, if we issue a SELECT statement on the table, Hive will read all of the files from the location specified during creation and return information. There will be no data returned if there is no file in the location.
Now the question is, if we copy another file to the same location, will the data from that file be displayed in the Hive table?
Yes, it reads the location whenever we issue the SELECT statement and searches for files and retrieves information from them.
Let’s put it to the test with an exercise.
I’m creating three text files with the following data.
file1.txt file2.txt file3.txt
1,Test1_1 5,Test2_1 9,Test3_1
2,Test1_2 6,Test2_2 10,Test3_2
3,Test1_3 7,Test2_3 11,Test3_3
4,Test1_4 8,Test2_4 12,Test3_4
As shown below, file1.txt and file2.txt are placed in the ‘exttable’ folder, and file3.txt is copied to the Desktop.
Now, using the HDFS terminal, I am copying the exttest folder from Desktop to /user/cloudera, which will be the file location for the external table.
[cloudera@quickstart ~]$ hadoop fs -put Desktop/exttest /user/cloudera/
The directory/folder has been copied to the user directory. Let’s verify.
[cloudera@quickstart ~]$ hadoop fs -ls /user/cloudera/exttest;
Found 2 items
-rw-r–r– 1 cloudera cloudera 40 2021-12-24 01:03 /user/cloudera/exttest/file1.txt
-rw-r–r– 1 cloudera cloudera 40 2021-12-24 01:03 /user/cloudera/exttest/file2.txt
Now, in Hive, let’s create the external table.
hive> CREATE EXTERNAL TABLE exttest(someid INT, sometxt STRING)
ROW FORMAT DELIMITED
FILEDS TERMINATED BY ‘,’
LOCATION ‘/user/cloudera/exttest’;
Now, let’s verify the data.
SELECT * FROM exttest;
We can now see the data from the two text files.
Let’s now copy file3.txt to the same location.
[cloudera@quickstart ~]$ hadoop fs -put Desktop/file3.txt /user/cloudera/exttest
[cloudera@quickstart ~]$ hadoop fs -ls /user/cloudera/exttest;
Found 3 items
-rw-r–r– 1 cloudera cloudera 40 2021-12-24 01:03 /user/cloudera/exttest/file1.txt
-rw-r–r– 1 cloudera cloudera 40 2021-12-24 01:03 /user/cloudera/exttest/file2.txt
-rw-r–r– 1 cloudera cloudera 43 2021-12-24 01:06 /user/cloudera/exttest/file3.txt
Now, let’s go back and see what Hive table displays.
SELECT * FROM exttest;
Data from the three files are being displayed.
I hope you found this post to be informative.
Please enter your email address to receive notifications of new postings.