Loading Multiple FIles in Hive

Loading data from many flat files into an RDBMS is a challenging operation. But in Hive, it won’t take long since Apache Hive is designed to make it easy to load, read, write, and manage massive and multiple datasets stored in distributed storage using SQL.

The example below will show you how to load several flat files into a table in Apache Hive.

Let’s consider the below datasets for the practice.

File: USPopulation1.txt
Phyllis L Pike,4961,John Calvin Drive,Chicago,IL,60606
Teri B Smith,3430,Caynor Circle,Branchburg,NJ,08876
Charmaine M Crumble,3731,Cerullo Road,Louisville,KY,40205
Michael S Aguilera,943,Sundown Lane,Austin,TX,78748

File: USPopulation2.txt
Agnes R Lavine,4841,Glory Road,Nashville,TN,37201
Teddy C Fenderson,205,Camden Place,Mount Pleasant,SC,29464
James S Her,2491,Jim Rosa Lane,SAN JOSE,CA,95120
Susan A Yamasaki,1014,Parrill Court,LIMERICK,NY,13657

USPopulationData

Let’s store these flat files into HDFS in a specific directory.

[cloudera@quickstart ~]$ hadoop fs -mkdir /user/cloudera/multifiles
The above command creates a directory called “multifiles” in /user/cloudera.

[cloudera@quickstart ~]$
hadoop fs -put Desktop/Docs/USPopulation1.txt /user/cloudera/multifiles
[cloudera@quickstart ~]$
hadoop fs -put Desktop/Docs/USPopulation2.txt /user/cloudera/multifiles

The preceding commands move the flat files from the local system to the “multifiles” directory in /user/cloudera.

[cloudera@quickstart ~]$ hadoop fs -ls /user/cloudera/multifiles
The above command will list out the files from the specified directory.

Result is:
Found 2 items
-rw-r–r– 1 cloudera cloudera 217 2021-06-10 14:52 /user/cloudera/multifiles/USPopulation1.txt
-rw-r–r– 1 cloudera cloudera 212 2021-06-10 14:52 /user/cloudera/multifiles/USPopulation2.txt

CreateDir_CopyFiles_Scrshot

We have now successfully imported the flat files into HDFS using the given method. Now, in Apache Hive, let’s create an external table and connect these files to it.

CREATE EXTERNAL TABLE MultifilesEx(
         Name String,
         Street Int,
         Area String,
         City String,
         State String,
         Zip Int)

ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LOCATION ‘/user/cloudera/multifiles’

Hive_CreateExternalTable

The preceding command not only creates a table but also loads data from the provided location where the flat files were put. It reads data from files automatically, therefore there is no need to import or load any data manually.

Hope you find this article helpful.

Please subscribe to receive notifications on latest updates.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s