Metadata Improvements

In CDP, all catalog metadata improvements are enabled by default. You may use these few knobs to control how Impala manages its metadata to improve performance and scalability.


In CDP, the on-demand use_local_catalog mode is set to True by default on all the Impala coordinators so that the Impala coordinators pull metadata as needed from catalogd and cache it locally. This results in many performance and scalability improvements, such as reduced memory footprint on coordinators and automatic cache eviction.


The granularity of on-demand metadata fetches is at the partition level between the coordinator and catalogd. Common use cases like add/drop partitions do not trigger unnecessary serialization/deserialization of large metadata.

The feature can be used in either of the following modes.
Metadata on-demand mode
In this mode, all coordinators use the metadata on-demand.
Set the following on catalogd:
Set the following on all impalad coordinators:
Mixed mode
In this mode, only some coordinators are enabled to use the metadata on-demand.
Cloudera recommends that you use the mixed mode only for testing local catalog’s impact on heap usage.
Set the following on catalogd:
Set the following on impalad coordinators with metdadata on-demand:

HDFS caching is not supported in On-demand metadata mode coordinators.


See Impala Metadata Management for the details about catalog improvements.