Indexes in Apache Hive

The aim of indexing in Hive is the same as it is in any RDBMS: to improve query performance by enhancing the speed of query search on certain columns of a database, which are primarily utilized in the WHERE clauses. If indexes are used, just a subset of the file/table has to be loaded and processed; otherwise, the full table is loaded and all rows are processed.

Indexing in Hive is introduced from the version 0.7.0, and bitmap indexing was added in version 0.8.0. However, the same (indexing) has been removed from Hive version 3.0. The reason has been explained in the documentation as below.

There are alternate options which might work similarly to indexing:

  • Materialized views with automatic rewriting can result in very similar results. Hive 2.3.0 adds support for materialzed views.
  • Using columnar file formats (Parquet, ORC) – they can do selective scanning; they may even skip entire files/blocks.

Compact indexing and bitmap indexing are the two types of indexes offered in older versions of Hive. Compact indexing retains the value of an indexed column and its block id, whereas Bitmap indexing is a common approach for indexing columns with a small number of unique values. This article will show you how to build, edit, and drop indexes if you’re using an earlier version of Hive.

The following is the standard syntax for defining indexes on columns.

Syntax:
CREATE INDEX index_name
ON TABLE base_table_name (col_name, …)
AS ‘index.handler.class.name’
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, …)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, …)]
[
[ ROW FORMAT …] STORED AS …
| STORED BY …
]
[LOCATION hdfs_path]
[TBLPROPERTIES (…)]
[COMMENT “index comment”]

EXAMPLES:
–Creating the Compact Index

CREATE INDEX idx_EmpID_Emp ON TABLE Emp(EmpID)
AS ‘org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler’
WITH DEFERRED REBUILD;

CompactIndex_Emp_Hive

–Creating the Bitmap Index
CREATE INDEX idx_EmpID_Emp ON TABLE Emp(DeptNo)
AS ‘BITMAP’
WITH DEFERRED REBUILD;

CompactBitmapIndexes

The kind of indexing to apply is specified by index type. If we wish to utilize the compact index handler, we’ll replace index type with the clause – ‘org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler’. If we want a bitmap index, we’ll use index type “bitmap”. It defines the Java class that will be used to implement indexing.

–Altering the Index
ALTER INDEX idx_EmpID_Emp ON Emp REBUILD;

AlterIndex_Hive

–Dropping the Index
DROP INDEX [IF EXISTS] idx_EmpID_Emp ON Emp;

dropIndex_Hive

Hope you find this article helpful.

Please subscribe for more interesting updates.

3 comments

  1. Hey there! Do you know if they make any plugins to assist with SEO?
    I’m trying to get my blog to rank for some targeted keywords but I’m not seeing very good results.
    If you know of any please share. Many thanks!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s