The aim of indexing in Hive is the same as it is in any RDBMS: to improve query performance by enhancing the speed of query search on certain columns of a database, which are primarily utilized in the WHERE clauses. If indexes are used, just a subset of the file/table has to be loaded and processed; otherwise, the full table is loaded and all rows are processed.
Indexing in Hive is introduced from the version 0.7.0, and bitmap indexing was added in version 0.8.0. However, the same (indexing) has been removed from Hive version 3.0. The reason has been explained in the documentation as below.
There are alternate options which might work similarly to indexing:
- Materialized views with automatic rewriting can result in very similar results. Hive 2.3.0 adds support for materialzed views.
- Using columnar file formats (Parquet, ORC) – they can do selective scanning; they may even skip entire files/blocks.
Compact indexing and bitmap indexing are the two types of indexes offered in older versions of Hive. Compact indexing retains the value of an indexed column and its block id, whereas Bitmap indexing is a common approach for indexing columns with a small number of unique values. This article will show you how to build, edit, and drop indexes if you’re using an earlier version of Hive.
The following is the standard syntax for defining indexes on columns.
CREATE INDEX index_name
ON TABLE base_table_name (col_name, …)
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, …)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, …)]
[ ROW FORMAT …] STORED AS …
| STORED BY …
[COMMENT “index comment”]
–Creating the Compact Index
CREATE INDEX idx_EmpID_Emp ON TABLE Emp(EmpID)
WITH DEFERRED REBUILD;
–Creating the Bitmap Index
CREATE INDEX idx_EmpID_Emp ON TABLE Emp(DeptNo)
WITH DEFERRED REBUILD;
The kind of indexing to apply is specified by index type. If we wish to utilize the compact index handler, we’ll replace index type with the clause – ‘org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler’. If we want a bitmap index, we’ll use index type “bitmap”. It defines the Java class that will be used to implement indexing.
–Altering the Index
ALTER INDEX idx_EmpID_Emp ON Emp REBUILD;
–Dropping the Index
DROP INDEX [IF EXISTS] idx_EmpID_Emp ON Emp;
Hope you find this article helpful.
Please subscribe for more interesting updates.
Hey there! Do you know if they make any plugins to assist with SEO?
I’m trying to get my blog to rank for some targeted keywords but I’m not seeing very good results.
If you know of any please share. Many thanks!