Use cases for Cloudera Lakehouse Optimizer

Cloudera Lakehouse Optimizer is a service in Cloudera on cloud Management Console that automates Iceberg table maintenance for Open Data Lakehouse users, leveraging on all of the optimization actions available with Iceberg. You can use Cloudera Lakehouse Optimizer for various use cases.

Some of the use cases where you can automate your Iceberg table maintenance tasks include:
  • Removing older and unnecessary snapshots from the table's metadata. This action helps to keep the metadata compact and the query performance optimal.
  • Compacting small files into larger ones. This action optimizes storage and improves read performance.
  • Rewriting the manifest files to remove entries for deleted data files. This action improves the metadata scan performance.
  • Removing the data files that are no longer referenced by the table's metadata. This action reclaims storage space.
  • Compacting small positional delete files into larger ones, and filtering out the positional delete records that refer to data files that are no longer available. This action reclaims storage space.