Automatic metadata management
Impala automatic metadata management feature uses notifications to refresh data after changes.
By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.
When tools such as Hive and Spark are used to process raw data
ingested into Hive tables, new Hive metastore metadata (in the form of
databases, tables, and partitions) and file system metadata (in the form of
new files in existing partitions and tables) are generated. In previous
versions of Impala, to pick up this new information, Impala users had to
manually issue an INVALIDATE
or REFRESH
command. Now, there is a new feature called automatic metadata management.
Automatic metadata management works by using Hive metastore notifications
which performs the following:
- Invalidates tables when it receives an
ALTER TABLE
event. - Refreshes the partition when it receives an
ALTER TABLE ADD | DROP PARTITION
event. - Adds the tables or databases when it receives a
CREATE TABLE
orCREATE DATABASE
event. - Removes tables from the Catalog (
catalogd
) when it receives aDROP TABLE
orDROP DATABASE
events. - Refreshes table and partitions when it receives
INSERT
events.
When there are database-level changes, the following types of changes are supported:
- Database properties
- Comments on the database
- Owner of the database
- Default location of the database

To control this feature, use the --hms_event_polling_interval_s
flag on the catalogd
. When set to a positive value, it
enables Hive metastore event polling. The recommended value to set this
flag to is under 5 seconds.
The automatic metadata management feature does not support files or partitions that have been manually added to HDFS by using Spark, DistCp, or HDFS Put because no Hive metastore notification is generated for these operations. To handle these scenarios, use one of the following methods:
- Use the
LOAD DATA
command to handle the metadata changes, or - Run a
REFRESH [db_name.]table_name [PARTITION…
command or anALTER TABLE table_name RECOVER PARTITIONS
command.
If you need to disable automatic metadata management for certain tables or databases, set the following properties:
CREATE DATABASE <db_name> DBPROPERTIES ('impala.disableHmsSync'='true'); CREATE TABLE <tab_name> WITHTBLPROPERTIES ('impala.disableHmsSync'='true' | 'false');