SQL Server Partitions vs Hive Partitions

Partitioning is a way of separating tables into smaller chunks based on partition keys. Partitions, in other terms, are horizontal data slices that allow large quantities of data to be split into more manageable parts. These keys are important in determining how data is stored in the table. Partitioning is crucial in Apache Hive since the datasets are often large.

Partitioning and how to work with partitions in Apache Hive were covered in earlier blogs.

The concept of “partitioning” is not new. It is supported by the majority of RDBMSs, including Oracle SQL*Plus, Microsoft SQL Server, and others. This post, however, will discuss the distinctions between SQL Server partitioning and Apache Hive partitioning. Though there is no fundamental difference between partitioning in RDBMSs and partitioning in Hive, the distinction is in how it is implemented.

Dependency:
Partitioning in SQL Server is determined by partition schemas, functions, files, and filegroups. However, partitioning is straightforward and easy with Hive; how the data is partitioned must be specified when the table is created.

Data Import:
In Static Partition, data can be loaded from local file system and HDFS. However in Dynamic Partition, Inserting data into a dynamic partitioned table is a two-step process. To get the data into partitions, we must first dump it into a staging or temporary table. Only INSERT statements will work, not the “LOAD” command.


Backend Implementation:
Table partitioning in SQL Server is the logical separation of information with physical distribution in filegroups. By applying the partition schema over the table schema, a table can be partitioned. However, in Hive, the PARTITIONED BY keyword specifies how the data is physically distributed into chunks.

Partition Columns

In SQL Server, it is not feasible to partition data by two columns, however it is possible in Hive. Check the syntax below.

CREATE TABLE StudentInfo(
      RegID                 INT,
      StudentName  STRING,
      Address            STRING
)
PARTITIONED BY (
       City      STRING,
       State    STRING);

I hope you found this post useful.

If you discover any other discrepancies, please add them to this article.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s