Metadata Improvements
In CDP, all catalog metadata improvements are enabled by default. You may use these few knobs to control how Impala manages its metadata to improve performance and scalability.
use_local_catalog
In CDP, the on-demand use_local_catalog
mode is set to True by default on all
the Impala coordinators so that the Impala coordinators pull metadata as needed from
catalogd and cache it locally. This results in many performance and scalability
improvements, such as reduced memory footprint on coordinators and automatic cache
eviction.
catalog_topic_mode
The granularity of on-demand metadata fetches is at the partition level between the
coordinator and catalogd
. Common use cases like add/drop partitions do
not trigger unnecessary serialization/deserialization of large metadata.
- Metadata on-demand mode
- In this mode, all coordinators use the metadata on-demand.
- Mixed mode
- In this mode, only some coordinators are enabled to use the metadata on-demand.
HDFS caching is not supported in On-demand
metadata mode coordinators.
Reference:
See Impala Metadata Management for the details about catalog improvements.