Partitioned, Bucketed and Skewed Tables in Hive

When working with a large amount of data on a Hadoop file system, both partitioning and bucketing in Hive are used to avoid table scans and boost efficiency (HDFS). Tables are divided into smaller and more manageable pieces by defined partitions and/or buckets, which should possibly improve query performance.

The way data is segregated is the key difference between Partitioning and Bucketing. Let’s take a look at what partitioned, bucketed, and skewed tables are and how well they function.

Partitioned Tables:

Partitioning is a way of structuring tables by splitting them into smaller sections based on partition keys, which are crucial in determining how data is stored in the database. The tables that are partitioned called Partioned tables.

Performance:

Hive partitioning is a good way to speed up queries on big tables. Under table location, partitioning allows you to store data in separate subdirectories. It significantly aids queries that use the partition key as a query parameter (s).

>>>>>>>>> Read more about partitioning here.

Skewed Tables:
A skewed table is a sort of table in which the values that appear frequently (high skew) are split into distinct files, while the remainder of the values are sent to another file.

Performance:
By specifying skewed values, Hive will automatically split them into distinct files and take this into consideration during queries, allowing it to skip (or include) entire files if possible, improving efficiency.

>>>>>>>>> Read more about skewed tables here.

Bucketed Tables:
By choosing the amount of buckets to create, Hive Bucketing/Clustering is a mechanism for splitting data into more manageable files. A user-defined number will hash the value of the bucketing column into buckets.

Performance:
When you use bucketing, you limit the number of buckets in which you can store your data. This value is set in the table creation scripts. Joins on the Map side will be faster because to the equal quantities of data in each partition.

>>>>>>>>> Read more about bucketing here.

Hope you find this article helpful.

Please subscribe for more interesting updates.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s