Partitioning in Hive divides huge tables into smaller logical tables depending on column values; one logical table is created for each individual value. By defining
Tag: Bucket Sampling in Apache Hive
When to avoid bucketing in Hive
When working with large datasets that need to be divided into chunks for better management and the possibility to connect queries with other large datasets,
Bucketing in Apache Hive Part-2
Please see my previous post on bucketing and bucketed tables for more information. Bucketed Sorted Tables will be explored in this post. As discussed in
Data Sampling Techniques – Apache Hive
Data sampling is the best practice to understand the data patterns and trends of large datasets by looking at the smaller portion of the data