Analyzing Table in Apache Hive

In a previous post, we talked you how to generate and update statistics in Apache Impala. In Apache Hive, you can perform the same thing.

The optimizer can utilize statistics from whole tables and partitions. Aside from that, statistics can assist in deciphering distribution and table specifics. Physical characteristics are collected in statistics, such as the number of rows, the number of data files, the total size of the data files, and the file type. This article will demonstrate how to get and update table statistics.

The below command will help in generating the stats for the table.
ANALYZE TABLE db_name.tablename [PARTITION(partcol1[=val1], partcol2[=val2], …)]
COMPUTE STATISTICS [FOR COLUMNS]
[NOSCAN];

Examples:
ANALYZE TABLE emp COMPUTE STATISTICS;
Since column statistics are not generated automatically, you must manually gather them by performing the analyze table test compute statistics for columns command.
ANALYZE TABLE emp COMPUTE STATISTICS FOR COLUMNS;

Hope you find this article helpful.

Please follow us for more interesting updates.