Configuring Iceberg manifest caching in Impala Virtual Warehouse
Apache Iceberg provides a mechanism to cache the contents of Iceberg manifest files in memory. In Cloudera Data Warehouse (CDW), you can enable or disable Iceberg manifest caching for Impala Coordinators and Catalogd, and set a few other properties, in your Impala Virtual Warehouse.
The following default properties are set in CDW for manifest caching:
iceberg.io.manifest.cache-enabled=true;
iceberg.io.manifest.cache.max-total-bytes=104857600;
iceberg.io.manifest.cache.expiration-interval-ms=3600000;
iceberg.io.manifest.cache.max-content-length=8388608;
The following list describes
each property: -
iceberg.io.manifest.cache-enabled
: enable/disable the manifest caching feature. -
iceberg.io.manifest.cache.max-total-bytes
: maximum total amount of bytes to cache in the manifest cache. Must be a positive value. -
iceberg.io.manifest.cache.expiration-interval-ms
: maximum duration for which an entry stays in the manifest cache. Must be a non-negative value. Setting zero means cache entries expire only if it gets evicted due to memory pressure fromiceberg.io.manifest.cache.max-total-bytes
. -
iceberg.io.manifest.cache.max-content-length
: maximum length of a manifest file to be considered for caching in bytes. Manifest files with a length exceeding this property value will not be cached. Must be set with a positive value and lower thaniceberg.io.manifest.cache.max-total-bytes
.
Generally, you set a different value for the expiration interval in catalogd and the coordinator. The expiration time is later in catalogd, for example, 1 week. Catalogd needs caching for a longer period of time because the catalogd service serves table metadata.
Changing configuration parameters in Cloudera Data Warehouse is recommended only when following Cloudera instructions.