Statistics about whole tables and partitions may be used by the Impala query planner. Physical characteristics such as the number of rows, the number of data files, the overall size of the data files, and the file type are included in this information. This post will show you how to retrieve and also how to update the table statistics.
Look at the below examples.
SHOW TABLE STATS empavro2;
Output:
#Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location |
-1 | 1 | 1.30KB | Not Cached | Not Cached | AVRO | False | hdfs://quickstart.cloudera:8020/user/hive/warehouse/empavro2 |
SHOW TABLE STATS Employee_Parquet2;
Output:
#Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location |
-1 | 1 | 311B | Not Cached | Not Cached | PARQUET | False | hdfs://quickstart.cloudera:8020/user/hive/warehouse/employee_parquet2 |
SHOW TABLE STATS townslist;
Output:
#Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location |
-1 | 2 | 105.22KB | Not Cached | Not Cached | TEXT | False | hdfs://quickstart.cloudera:8020/user/hive/warehouse/townslist |
Refer to the screenshot:
The table statistics for an unpartitioned Parquet, avro and text file formatted tables are shown in the above example. The numbers for the number of files and their sizes are always accessible. Because a possibly costly scan of the full table is required to determine the number of rows, that value is initially shown as -1. The COMPUTE STATS command populates any missing table stats values.
Let’s update the statistics:
COMPUTE STATS empavro2;
COMPUTE STATS Employee_Parquet2;
COMPUTE STATS townslist;
Hope you find this article useful.
Please subscribe for more interesting updates.
One comment