Despite the fact that Apache Hive is not meant for OLTP, it does allow DML operations, but not in the same manner that SQL does. Transactions are disabled by default in the configuration file, however they must be enabled to operate with data modifications. This is applicable not just for CDH VM, but also for other Hadoop distributions.
Let’s have a look at what happens if you don’t enable it.
CREATE TABLE Test(CustID INT, CustName STRING);
INSERT INTO Test VALUES (100,’John’);
INSERT INTO Test VALUES (200,’Smith’);
We now have a test table to experiment with due to the aforementioned implementation. Now try to update or delete one of the table’s records.
UPDATE Test SET CustName = ‘Smithh’ WHERE CustID = 200;
If the config file is not changed earlier, you will see the following error message.
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
This is because, as stated earlier, transactions are disabled by default and DML operations currently works only for ORC file formatted tables. Now is the time to address the problem so that we can update or remove records.
Run the following command in the HDFS console to make the hive config file editable.
sudo vi /etc/hive/conf/hive-site.xml
Since this location of hive-site.xml varies by distribution and customized installation, use the proper path appropriately.
The above command will open the hive-site.xml in “vi editor” however, in order to edit the configuration or add new settings, click “i” which means “INSERT” mode and add the below parameters into it.
In order to save the changes and quit from the editor mode, press “Esc” button to enter into command mode and type “:wq!” to exit.
Restart the services and then attempt updating or deleting the same table again. It’s possible that you’ve encountered the following error.
FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table test that does not use an AcidOutputFormat or is not bucketed
As per the documentation, only ORC file format is supported in this first release. The feature has been built such that transactions can be used by any storage format that can determine how updates or deletes apply to base records (basically, that has an explicit or implicit row id), but so far the integration work has only been done for ORC.
So, let’s change our table structure as shown below.
CREATE TABLE Test(CustID STRING, CustFIrstName STRING, CustLastName STRING)
CLUSTERED BY (CustID) INTO 2 BUCKETS
STORED AS ORC
INSERT INTO Test VALUES (1100,’John’,’Smith’);
INSERT INTO Test VALUES (2100,’Alen’,’Miller’);
Now, you can UPDATE or DELETE the records from the table without any errors.
UPDATE Test SET CustID = 2001 WHERE CustID = 2100;
Hope you find this article helpful.
Please subscribe for more interesting updates.
Simply want to say your article is as astounding.
The clearness in your put up is simply cool and i can assume you are an expert on this subject.
Well with your permission allow me to clutch your RSS feed
to keep up to date with approaching post. Thank you 1,000,000 and please keep up the rewarding work.