The MERGE command synchronizes two tables by adding, removing, and modifying target table rows based on the source table’s join condition. In other words, when performing INSERT, UPDATE, and DELETE operations on a target table by matching the rows from the source table, MERGE is frequently utilized throughout the ETL process.
Hive v2.2 introduced the MERGE command that allows analysts and developers to conduct DML actions in a single statement, allowing them to perform INSERT, UPDATE, or DELETE if the record exists or not. If the operation succeeds, the modifications will be committed automatically.
Please note that MERGE can only be performed on tables that support ACID, hence transactions are configured to be ON. Please refer to the earlier article for more information. Also, note that the target table must be bucketed, and stored in ORC file format.
Syntax:
MERGE INTO <target table> AS T USING <source expression/table> AS S
ON <boolean expression1>
WHEN MATCHED [AND <boolean expression2>] THEN UPDATE SET <set clause list>
WHEN MATCHED [AND <boolean expression3>] THEN DELETE
WHEN NOT MATCHED [AND <boolean expression4>] THEN INSERT VALUES<value list>
Use Case:
CREATE TABLE SalesRepository(
ProductID STRING,
ProductName STRING,
Model STRING,
QtySold INT,
SoldDate STRING)
PARTITIONED BY (SoldDate STRING)
CLUSTERED BY (ProductID) INTO 4 BUCKETS
STORED AS ORC TBLPROPERTIES (‘transactional’=’true’);
CREATE TABLE NewSales(
ProductID STRING,
ProductName STRING,
Model STRING,
QtySold INT,
SoldDate STRING)
STORED AS ORC;
INSERT INTO SalesRepository PARTITION (SoldDate) VALUES
(P1001, ‘Hisense 65inch’, ’65U7WF’,5, ‘20210811’),
(P1002, ‘Hisense 58inch’, ’58A7100F’,3, ‘20210811’),
(P1003, ‘Hisense 55inch’, ’55A7100F’, 2,’20210811′),
(P1004, ‘Hisense 55inch’, ’55U7WF’, 4,’20210811′),
(P1005, ‘Hisense 43inch’, ’43A6000F’, 3,’20210812′),
(P1006, ‘Hisense 50inch’, ’50A6GE’, 0,’20210812′),
(P1007, ‘Hisense 50inch’, ’50A7100F’, 0,’20210813′),
(P1008, ‘Hisense 65inch’, ’65A6GE’, 2,’20210813′),
(P1009, ‘Hisense 43inch’, ’43B7100UW’,2, ‘20210813’);
INSERT INTO NewSales VALUES
(P1002, ‘Hisense 58inch’, ’58A7100F’,3, ‘20210811’),
(‘P1003’, ‘Hisense 55inch’, ’55A7100F’, NULL,’20210811′),
(P1006, ‘Hisense 50inch’, ’50A6GE’, 1,’20210812′),
(P1007, ‘Hisense 50inch’, ’50A7100F’, 2,’20210813′),
(P1010, ‘Hisense 50inch’,’50A7120FS’, 12,’20210814′);
Since “NewSales” contains accurate information, the table “SalesRepository” has to be updated depending on the values supplied in “NewSales.” If no sales occurred (QtySold in the “NewSales” table is null), the record should be deleted from the “SalesRepository.” The following must be accomplished.
- The values for QtySold in table “SalesRepository” for the ProductID # P1002, P1002, P1004, P1005, P1008 and P1009 should not be changed.
- The value for QtySold in table “SalesRepository” for the ProductID # P1003 should be deleted.
- The value for QtySold in table “SalesRepository” for the ProductID # P1006 should be updated from 0 to 1.
- The value for QtySold in table “SalesRepository” for the ProductID # P1007 should be updated from 0 to 2.
- The row with ProductID # P1010 in table “SalesRepository” should be inserted.
Merge Query:
MERGE INTO SalesRepository AS SR
USING NewSales AS NS
ON SR.ProductID = NS.ProductID and SR.SoldDate = NS.SoldDate
WHEN MATCHED AND (SR.QtySold != NS.QtySold AND SR.QtySold=0) THEN UPDATE SET QtySold = NS.QtySold
WHEN MATCHED AND NS.QtySold IS NULL THEN DELETE
WHEN NOT MATCHED THEN INSERT VALUES (NS.ProductID, NS.ProductName, Model, QtySold, SoldDate);
Hope you find this article helpful.
Please subscribe for more interesting updates.
One comment