Since Hive is a data warehouse and not designed for OLTP operations like SQL, data must be ingested from a variety of sources. It might come from local files, HDFS files that have been preloaded, RDBMS sources, or regular data insertion.
Let’s categorize them based on the loading methods.
(1) Using the LOAD command,
(2) Query-based inserting, and
(3) Straight insert statements.
LOAD command will help loading data from Local file system and HDFS into Hive. Earlier article explained, how the data can be inserted using queries (SELECT statements). In this one, we’ll see how the values can be inserted into Hive table using usual SQL DML statements. DML operations were not supported in earlier versions of Hive, however this was fixed in later versions.
Examples:
CREATE TABLE tblSales(
ProductID INT,
ProductName STRING,
TotalSales INT,
TotalAmount DECIMAL(9,2))
CLUSTERED BY (ProductID) INTO 30 BUCKETS;
INSERT INTO TABLE tblSales VALUES
(20981,’Hisense Television’, 7, 14000),
(20982,’Sony Television’, 4, 12000);
CREATE TABLE tblSales(
ProductID INT,
ProductName STRING,
TotalSales INT,
TotalAmount DECIMAL(9,2))
PARTITIONED BY (SoldDate STRING)
CLUSTERED BY (ProductID) INTO 33 BUCKETS;
INSERT INTO TABLE tblSales PARTITION (SoldDate = ‘2021-08-07’) VALUES
(20981,’Hisense Television’, 7, 14000),
(20982,’Sony Television’, 4, 12000);
Hope you find this article helpful.
Please do subscribe for more interesting updates.
2 comments