Loading Data From HDFS into Hive

Importing data into an RDBMS is a separate feature or function, although it is one of the DML commands in Hive. Data may be imported into a Hive table from HDFS or a local system. We’ll speak about importing data from HDFS into Hive in this post.

Syntax:
LOAD DATA INPATH <HDFS-Location> OVERWRITE INTO TABLE <Table-Name>

The use of “OVERWRITE” is optional, and it is intended to overwrite existing data. If the ‘OVERWRITE’ keyword is omitted, data files are appended to existing data sets. The load command does not do any data validation against the schema. The file is transferred into the Hive-controlled file system namespace if it is in HDFS.

Examples:
LOAD DATA INPATH ‘/user/cloudera/testfolder/test.txt’
INTO TABLE TestTable;
The command above will import data from ‘test.txt’ into an existing Hive table named “TestTable.”

LOAD DATA INPATH ‘/user/cloudera/testfolder/JulySales.csv’
OVERWRITE INTO TABLE tblSales PARTITION (month=’July’);
The command above will import data from “JulySales.csv” into a partitioned Hive table called “tblSales.” Because overwrite is utilized, the existing data will be replaced.

Hope you find this article helpful.

Please subscribe for more interesting updates.

2 comments

  1. Use the knowledge modules listed in the following table to load data from an HDFS file or Hive source into an Oracle database target using Oracle Loader for Hadoop.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s