Cloudera Lakehouse Optimizer for Iceberg table maintenance
Cloudera Lakehouse Optimizer is a service on Cloudera on cloud Management Console. It provides automated Iceberg table maintenance, using Spark jobs, for Iceberg tables in Cloudera Open Data Lakehouse. It simplifies table management, improves query performance, and reduces operational costs. This service is available from Cloudera on cloud 7.3.1.500 and higher versions, and is deployed in a dedicated Data Hub in the Cloudera Open Data Lakehouse environment.
- Simplifying the Iceberg table management by automating the optimization and maintenance activities, such as data file compaction and snapshot expiration.
- Reducing the total cost of ownership (TCO) by lowering the storage and compute costs.
- Maintaining the health of Iceberg tables to ensure high operational efficiency.
- Improving the query execution time on the Iceberg tables.
You can use Cloudera Lakehouse Optimizer in AWS and Azure environments. You create Cloudera Lakehouse Optimizer policies to perform Iceberg table maintenance. You can manage and monitor these policies as necessary, depending on your assigned role.
- Lakehouse Optimizer UI in Cloudera Management Console – You can only perform policy management on the UI. You can choose to trigger the table maintenance based on events using the event-based policies, or schedule the maintenance to run at regular intervals using the schedule-based policies. The events include HMS (Hive Metastore) events, such as insert, update, and delete operations on the table.
- REST APIs – You can perform policy management and
Iceberg table optimization operations using REST APIs. You can choose to trigger the table
maintenance based on events, or schedule it to run at regular intervals.
You can perform manual (ad hoc) maintenance when required. Ensure that you dry run the policy before manual evaluation. During dry run, Cloudera Lakehouse Optimizer only generates the table maintenance actions but does not initiate any maintenance actions.
A Cloudera Lakehouse Optimizer policy consists of a JEXL script and JSON file,
collectively referred to as policy resources. The JEXL script is mandatory, and the JSON file
is optional. The default ClouderaAdaptive policy consists of the
ClouderaAdaptive.jexl script and the
ClouderaAdaptive.json file. Cloudera Lakehouse Optimizer uses
the ClouderaAdaptive policy as a template to help you create policies.
You can view the task maintenance status on the UI which appears as an icon, use REST APIs, or view the detailed status on the Cloudera Observability dashboard.
