On-demand Metadata
With the on-demand metadata feature, the Impala coordinators pull
metadata as needed from catalogd
and cache it locally. The
cached metadata gets evicted automatically under memory pressure.
The granularity of on-demand metadata fetches is at the partition level between the
coordinator and catalogd
. Common use cases like add/drop partitions do
not trigger unnecessary serialization/deserialization of large metadata.
The feature can be used in either of the following modes.
- Metadata on-demand mode
- In this mode, all coordinators use the metadata on-demand.
- Mixed mode
- In this mode, only some coordinators are enabled to use the metadata on-demand.
- Flags related to
use_local_catalog
- When
use_local_catalog
is enabled or set toTrue
on the impalad coordinators the following list of flags configure various parameters as described below. It is not recommended to change the default values on these flags.
HDFS caching is not supported in On-demand
metadata mode coordinators.
INVALIDATE METADATA
Usage Notes:
To return accurate query results, Impala needs to keep the metadata current for the
databases and tables queried. Through "automatic invalidation" or "HMS event polling" support,
Impala automatically picks up most changes in metadata from the underlying systems. However
there are some scenarios where you might need to run
INVALIDATE METADATA
or
REFRESH
. - if some other entity modifies information used by Impala in the metastore, the
information cached by Impala must be updated via
INVALIDATE METADATA
orREFRESH
, - if you have "local catalog" enabled without "HMS event polling" and need to pick up metadata changes that were done outside of Impala in Hive and other Hive client, such as SparkSQL,
- and so on.