On-demand Metadata

With the on-demand metadata feature, the Impala coordinators pull metadata as needed from catalogd and cache it locally. The cached metadata gets evicted automatically under memory pressure.

The granularity of on-demand metadata fetches is at the partition level between the coordinator and catalogd. Common use cases like add/drop partitions do not trigger unnecessary serialization/deserialization of large metadata.

The feature can be used in either of the following modes.
Metadata on-demand mode
In this mode, all coordinators use the metadata on-demand.
Set the following on catalogd:
Set the following on all impalad coordinators:
Mixed mode
In this mode, only some coordinators are enabled to use the metadata on-demand.
We recommend that you use the mixed mode only for testing local catalog’s impact on heap usage.
Set the following on catalogd:
Set the following on impalad coordinators with metdadata on-demand:

HDFS caching is not supported in On-demand metadata mode coordinators.


To return accurate query results, Impala needs to keep the metadata current for the databases and tables queried. Through "automatic invalidation" or "HMS event polling" support, Impala automatically picks up most changes in metadata from the underlying systems. However there are some scenarios where you might need to run INVALIDATE METADATA or REFRESH.
  • if some other entity modifies information used by Impala in the metastore, the information cached by Impala must be updated via INVALIDATE METADATA or REFRESH,
  • if you have "local catalog" enabled without "HMS event polling" and need to pick up metadata changes that were done outside of Impala in Hive and other Hive client, such as SparkSQL,
  • and so on.