Cloudera Lakehouse Optimizer for Iceberg table maintenance
The Cloudera Lakehouse Optimizer service in Cloudera Base on premises 7.3.2 or higher versions cluster provides automated Iceberg table maintenance, using Spark jobs, for Iceberg tables in Cloudera Open Data Lakehouse. The service simplifies table management, improves query performance, and reduces operational costs.
You can add the Cloudera Lakehouse Optimizer service to an existing Cloudera Base on premises 7.3.2 or higher versions cluster in Cloudera Manager 7.13.2 or higher versions, or you can create a dedicated cluster for Cloudera Lakehouse Optimizer and then add the service. You must ensure that the cluster contains all the required services. After you finish configuring the service, you can use the Cloudera Lakehouse Optimizer service REST APIs to define the Cloudera Lakehouse Optimizer policies, perform policy management, and run other Iceberg table optimization operations.
- Simplifying the Iceberg table management by automating the optimization and maintenance activities, such as data file compaction and snapshot expiration.
- Reducing the total cost of ownership (TCO) by lowering the storage and compute costs.
- Maintaining the health of Iceberg tables to ensure high operational efficiency.
- Improving the query execution time on the Iceberg tables.
You create Cloudera Lakehouse Optimizer policies to perform Iceberg table maintenance. You can manage and monitor these policies as necessary, depending on your assigned role.
You can define the policies, perform policy management, and Iceberg table optimization operations using REST APIs. You can choose to trigger the table maintenance based on events, or schedule it to run at regular intervals. You can also perform manual (ad hoc) maintenance when required. Ensure that you dry run the policy before manual evaluation. During the dry run process, Cloudera Lakehouse Optimizer only generates the table maintenance actions but does not initiate any maintenance actions.
A Cloudera Lakehouse Optimizer policy consists of a JEXL script and JSON file,
collectively referred to as policy resources. The JEXL script is mandatory, and the JSON file
is optional. The default ClouderaAdaptive policy consists of the
ClouderaAdaptive.jexl script and the
ClouderaAdaptive.json file. Cloudera Lakehouse Optimizer uses
the ClouderaAdaptive policy as a template to help you create policies.
