On-demand metadata and metadata management

The on-demand metadata feature streamlines metadata handling on the coordinators.

By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.

Before the on-demand metadata feature was released, every coordinator kept a replica of all the metadata locally. This consumed a large amount of memory on each coordinator with no option to evict it. Metadata was sent by the Statestore to all the coordinators despite how the metadata was eventually used.

The on-demand metadata feature includes the following improvements:

  • Coordinators pull metadata from the Catalog daemon (catalogd) for the required tables when they need it for query planning and cache it locally. The Statestore no longer transmits all the metadata for all the tables to each coordinator.
  • Cached metadata gets evicted automatically under memory pressure treating the cache like an LRU (least recently used) buffer.
  • Granularity of on-demand metadata fetch is at the partition level, so common use cases like add/drop partitions do not trigger unnecessary metadata transfers.

For more information about on-demand metadata management in Impala, see Managing Metadata in Impala.