In this tutorial, we’ll look at how to use SELECT statements to INSERT data. This implementation is identical to that of SQL. Although I couldn’t conceive of this article’s content as a distinct topic, and it is also an integral part of many of my previous articles, I am considering it now for the purpose of novices who are studying “Apache Hive Course Contents” step by step.
Inserting data into Hive tables can be done in three ways.
(1) Using the LOAD command,
(2) Query-based inserting, and
(3) Straight insert statements.
LOAD command will help loading data from Local file system and HDFS into Hive. DML operations were not supported in earlier versions of Hive, however this was fixed in later versions. In this article, we will see how the data can be inserted using queries (SELECT statements).
Standard syntax:
INSERT OVERWRITE TABLE tablename1
[PARTITION (partcol1=val1, partcol2=val2 …) [IF NOT EXISTS]]
select_statement1 FROM from_statement;
Examples:
INSERT INTO TempTable2 SELECT * FROM TempTable1
INSERT INTO TempTable2
SELECT Col1, Col2, Col3 FROM TempTable1
INSERT INTO TempTable2
SELECT Col1, Col2, Col3 FROM TempTable1 WHERE Col1 = 1000;
INSERT INTO TempTable2
SELECT DISTINCT Col1, Col2, Col3 FROM TempTable1 WHERE Col1 = 1000;
INSERT INTO TempTable2
SELECT Col1, Col2, Col3 FROM TempTable1 WHERE Col1 = 1000;
— If the table is partitioned
INSERT INTO TownsList_Dynamic PARTITION(Country)
SELECT * FROM TownsList;
Hope you find this article helpful.
Please subscribe for more interesting updates.
One comment