Partitioning in Apache Hive

Data in Apache Hive is classified as Table, Partition, or Bucket. Hadoop is often designed to handle massive datasets, thus tables will contain massive amounts of data. Partitioning is a method of organizing tables by dividing them into smaller portions based on partition keys, which are fundamental factors in defining how data is kept in the database.

Partitioning has the advantage of distributing execution load horizontally. There is a significant increase in performance while data retrieval since searching in data chunks is much faster than searching in the entire table.

However, there is a disadvantage to Partitioning. One significant aspect is that there is the risk of creating too many tiny partitions—too many directories.

Partitioning is simple with Apache Hive, unlike in RDBMSs. It is also simple to manage them, such as creating, deleting, and renaming partitions.

Partitions are further classified into two types: Static (manual) and Dynamic.

==============
Static Partition:
==============
>>> What:
The technique of putting individual input data files into a partition table is known as static partitioning. In other words, you create a partition in the database and transfer the file into it manually.

>>> Why:
These static partitions are commonly used for importing large files into Hive tables since they minimize loading time.

>>> How:
If the default value of the property hive.mapred.mode is changed in hive-site.xml, it must be set to ‘strict’.
Static partitioning can be done on either the Hive Manage or an external table.
The partition in the static partition can be altered, renamed and dropped.

=================
Dynamic Partition:
=================
>>> What:
Dynamic partitioning refers to the amount of manual intervention necessary to load substantial amounts of data into a partition table is low, or when a single insert statement divides the data into partitions. Dynamic partitioning often loads data from a non-partitioned table.

>>> Why:
Although dynamic partitions take longer to load data than static partitions, they are the ideal solution when you have a significant amount of data stored in a table. Apart from this, if you wish to split a number of columns but don’t know how many, dynamic partitioning is an option.

>>> How:
The property hive.mapred.mode is must be set to ‘non-strict’. Similar to Static partitions, dynamic partitions can also be done on either the Hive internal or an external table. Note that the partition in the dynamic partition cannot be altered.

For “Static Partitions” Examples – Click Here.
For “Dynamic Partitions” Examples – Click Here.

Hope you find this article helpful.

Please subscribe for more interesting updates.

Partitioning in Apache Hive

9 comments

Leave a Reply Cancel reply

Share this:

Related

Leave a Reply Cancel reply