Check if table is skewed – Apache Hive

As stated in the earlier article, Skewed tables are those in which some column values occur more frequently than others. As a result, the distribution is skewed. Hive will automatically separate skewed values into different files and take this into consideration during searches so that it can skip or include whole files if possible; thus enhances the performance.

In this post, we’ll look at how to determine if a table is skewed.

There are two ways to accomplish this. One method is to use the describe command in Apache Hive, and the other is to retrieve metadata information from the underlying MySQL database.

In Hive:
hive> DESC FORMATTED SkewedTable;
The above command will return the table properties in a formatted fashion in which you can see the following properties.

Bucket Columns: []
Sort Columns: []
Skewed Columns: [column1]
Skewed Values: [[value1]]

If there is no value associated with “skewed Columns”and “skewed Values” that means the table is not skewed.

in MySQL:
mysql> SELECT SC.SKEWED_COL_NAME,
TN.TBL_NAME
FROM SKEWED_COL_NAMES SC, TBLS TN
WHERE SC.SD_ID=TN.SD_ID;
+——————————-+——————+
| SKEWED_COL_NAME | TBL_NAME |
+——————————-+——————+
| column1                      | value1         |
+——————————-+——————+
1 row in set (0.00 sec)

For more information about Skewed Tables – Click here
Skewed Tables Examples – Click here
Altering Skewed Tables – Click here
Skewed Values on Several Columns – Click here

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s