Locking In Apache Hive

In most RDBMSs, locking is an important element of the isolation requirement, and it helps to lock the objects affected by a transaction. The SQL engine prevents other transactions from altering data in objects that are subject to the enforced lock while they are locked.

As discussed in the previous posts, Apache Hive is not designed for OLTP or transactional support. There will not be typical and frequent inserts, updates, or deletes, Apache does not require a locking mechanism on a column, row, or query. Since Hadoop and Hive are multi-user systems, locking and coordination might be helpful in particular situations. If a user uses INSERT OVERWRITE to change the table’s content, the lock mechanism should prohibit other users from querying the same table. Locking is controlled by distinct systems such as zookeeper since Hive CLI, thrift server, and web-interface are independent of each other.

The Hive’s configuration file “hive-site.xml” needs to be updated with the following properties.

<property>
<name>hive.zookeeper.quorum</name>
<value>zk1.site.pvt,zk1.site.pvt,zk1.site.pvt</value>
<description>read write locks</description>
</property>

<property>
<name>hive.support.concurrency</name>
<value>true</value>
<description>read write locks</description>
</property>

Once these configuration is in place, Hive automatically starts acquiring locks for certain queries and the current locks can be seen using the following command.

hive> SHOW LOCKS
hive> SHOW LOCKS tablename EXTENDED

Explicit and exclusive locks can also be managed in Hive using the below commands.
hive> LOCK TABLE tablename EXCLUSIVE
hive> UNLOCK TABLE tablename

Hope you find this article helpful.

Please subscribe for more interesting updates.