Partitioning in Hive divides huge tables into smaller logical tables depending on column values; one logical table is created for each individual value. By defining
Tag: Bucketing in Hive
Apache Hive Data Model
Apache Hive is built on top of Apache Hadoop, which is a distributed, fault-tolerant, and open source data warehouse platform for reading, writing, and handling
Partitioned, Bucketed and Skewed Tables in Hive
When working with a large amount of data on a Hadoop file system, both partitioning and bucketing in Hive are used to avoid table scans
When to avoid bucketing in Hive
When working with large datasets that need to be divided into chunks for better management and the possibility to connect queries with other large datasets,
Bucketing in Apache Hive Part-2
Please see my previous post on bucketing and bucketed tables for more information. Bucketed Sorted Tables will be explored in this post. As discussed in
Bucketing In Apache Hive
Partitioning and Bucketing in Apache Hive can greatly assist in breaking tabular data collections into more manageable portions. Hive Partitioning is a method of separating