Bucket Sampling in Hive – Big Data & SQL

Partitioning vs Bucketing in Hive

20th Jan 2022 SHAFI SHAIK

Partitioning in Hive divides huge tables into smaller logical tables depending on column values; one logical table is created for each individual value. By defining

Partitioned, Bucketed and Skewed Tables in Hive

14th Jan 2022 SHAFI SHAIK

When working with a large amount of data on a Hadoop file system, both partitioning and bucketing in Hive are used to avoid table scans

When to avoid bucketing in Hive

13th Jan 2022 SHAFI SHAIK

When working with large datasets that need to be divided into chunks for better management and the possibility to connect queries with other large datasets,

Bucketing in Apache Hive Part-2

12th Nov 2021 SHAFI SHAIK

Please see my previous post on bucketing and bucketed tables for more information. Bucketed Sorted Tables will be explored in this post. As discussed in

Data Sampling Techniques – Apache Hive

13th Dec 2020 SHAFI SHAIK

Data sampling is the best practice to understand the data patterns and trends of large datasets by looking at the smaller portion of the data