As we discussed in the earlier posts, Skewed tables are those in which some column values occur more frequently than others. As a result, the distribution is skewed. Hive will automatically separate skewed values into different files and take this into consideration during searches so that it can skip or include whole files if possible; thus enhances the performance.
While creating the table if you provide the column and the value, Hive will separate the data into different directories so that while querying it can improve the performance.
In this post we will see, how to alter the skewed tables.
Update the skewed values:
A table’s SKEWED and STORED AS DIRECTORIES options can be changed with ALTER TABLE statements.
ALTER TABLE table_name SKEWED BY (col_name1, col_name2, …)
ON ([(col_name1_value, col_name2_value, …) [, (col_name1_value, col_name2_value), …]
[STORED AS DIRECTORIES];
The STORED AS DIRECTORIES option determines whether a skewed table uses the list bucketing feature, which creates subdirectories for skewed values.
Alter Table Not Skewed
ALTER TABLE table_name NOT SKEWED;
The NOT SKEWED option makes the table non-skewed and turns off the list bucketing feature (since a list-bucketing table is always skewed). This affects partitions created after the ALTER statement, but has no effect on partitions created before the ALTER statement.
Alter Table Not Stored as Directories
ALTER TABLE table_name NOT STORED AS DIRECTORIES;
This turns off the list bucketing feature, although the table remains skewed.
Alter Table Set Skewed Location
ALTER TABLE table_name
SET SKEWED LOCATION (col_name1=”location1″ [,col_name2=”location2″, …] );
I hope you found this post to be informative.
Please enter your email address to receive notifications of new postings.
Apache Hive Skewed Tables
Apache Hive Skewed Tables Examples