As per the documentation, the INVALIDATE METADATA command marks one or all tables’ metadata as stale. The next time the Impala service runs a query against a table with invalid metadata, Impala reloads the related metadata before continuing with the query.
Since this is a more expensive operation than the incremental metadata update provided by the REFRESH statement, use REFRESH instead of INVALIDATE METADATA wherever possible.
When the following modifications are done outside of Impala, in Hive and other Hive clients, such as SparkSQL, INVALIDATE METADATA is required:
- The existing table metadata is updated.
- Impala will use the new tables that have been added.
- Ranger privileges at the SERVER or DATABASE level are altered.
- Changes to metadata are blocked, but the files remain unchanged (HDFS rebalance).
- UDF jars are subject to change.
- To save memory, you wish to remove the metadata from the catalog and coordinator caches for tables that are no longer searched.
When updates are made via impalad, no INVALIDATE METADATA is required.
It is always better to provide the table name of which metadata needs to be refreshed. If no table is supplied, Hive Metastore flushes and syncs all cached metadata for all tables (HMS). Tables that have been deleted from the HMS will be removed from the catalog, and new tables will appear in the catalog.
Hope you find this article helpful.!!
1 comments