Hive Database Boundaries

A list of Apache Hive database level boundaries is provided below. Many of these, though, may have workarounds. This is a list of things that you can’t accomplish with Hive, as well as the limits of Hive.

  • During the database creation, specify where the database needs to be stored. Once database is created, there’s no straight forward solution to change its location.
  • During the database creation, specify the database properties correctly. Once created they cannot be modified through ALTER statements.
  • The database’s metadata, including its name, cannot be modified.
  • By default, Hive’s query output will not return any column names. However, Hive’s configuration property hive.cli.print.print.header must be set to true in order to see the column names in the output.
  • There is no way to determine which database is presently in use. However, Hive’s configuration property hive.cli.print.current.db must be set to true in order to see the database name in the Hive prompt.
  • Real-time queries and row level changes are not available in Apache Hive. In other terms, Hive is not  designed for online transactions (OLTP).
  • In general, Apache Hive queries have a high latency.
  • NVL command will not work in Hive. Use COALESCE instead.
  • Date format is a string (prior to Hive 0.12.0). However it can be converted into the format we required.
  • Usage of aliases is not same like in SQL.
  • If we create a table without first specifying a database, the table will be stored in the system’s “default” database.
  • The data files for the external tables must be saved in a folder that cannot be used for anything else. Hive examines every readable file from the provided path. As a result, keep your location folders distinct.
  • You can load the data from HDFS and Local file system into “Text” and “Avro” file formatted table only. You’ll need a staging table in Hive to import data into other file structured tables like Parquet, RC, Sequential file format, and so on. For certain file types, load will not function; only INSERT would, which is for a reason explained in previous blogs.
  • You cannot load data directly into dynamic partitioned table. You must have staging table to use “INSERT..SELECT”

The majority of the elements listed above are not regarded limitations because they are Hive’s behavior for a purpose. And while certain boundaries may not last for a long time, recent upgrades may be able to resolve such difficulties. 

Hope you find this article interesting.

Please subscribe for latest updates.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s