Cloudera Lakehouse Optimizer features
Cloudera Lakehouse Optimizer supports several features. Some features are available only through REST APIs. You can choose to create Cloudera Lakehouse Optimizer policies in the UI or use REST APIs depending on your requirements.
The following table lists the supported features and the supported methods to use these features:
| Features | Available methods to use the feature | Description |
|---|---|---|
| Event-based policy |
|
Schedules the policies to be evaluated when an HMS event is triggered, such as an
insert, update or delete operation on the table. You can create only one version of the policy definition at the catalog level in the UI. However, you can create multiple versions of the policy definition at catalog, namespace, or table level using REST APIs. For example, when you create policy P1 in the UI, the definition is defined at the catalog level. However, using REST APIs you can create another definition for P1 at namespace level or table level. |
| Schedule-based policy |
|
Schedules policies to be evaluated at regular intervals. You can create only one version of the policy definition at the catalog level in UI. However, you can create multiple versions of the policy definition at catalog, namespace, or table level using REST APIs. For more information, see Creating a Cloudera Lakehouse Optimizer policy in Lakehouse Optimizer UI and Defining Cloudera Lakehouse Optimizer resources using REST APIs. |
| Manual (ad hoc) evaluation |
|
Runs the policies manually to optimize the Iceberg tables when required. For more information, see Performing manual Iceberg table maintenance using Cloudera Lakehouse Optimizer REST APIs. |
| Dry-run policies |
|
Generates the table maintenance actions but does not initiate any maintenance actions.
Dry run the policies to ensure they run effectively without failure. For more information, see Preparing and defining Cloudera Lakehouse Optimizer policies using REST APIs. |
Small file compaction options include:
|
|
Automates the Iceberg data file compaction maintenance actions. In Apache Iceberg documentation, this procedure is called rewrite_data_files, and it supports Table, Strategy (binpack or sort), sort_order (zorder, sortDirection, NullOrder), options, and where arguments which are also supported by Cloudera Lakehouse Optimizer. |
Orphan file removal includes:
|
|
Automates the Iceberg orphan file removal maintenance action. |
Snapshot expiration options include:
|
|
Automates the Iceberg snapshot management maintenance actions. |
Rewrite manifest options include:
|
|
Automates the Iceberg manifest rewrite maintenance actions. |
Positional delete rewrite options include:
|
|
Automates the Iceberg positional delete rewrite maintenance actions. |
| Pause and resume table maintenance manually |
|
Pauses table maintenance. The table maintenance is paused in the following
scenarios:
For more information, see Pausing and resuming table maintenance. |
| CLO event logging | REST API | Ingests the maintenance task metadata, also called an event, into the
sys.task_events Iceberg table. You can use the table to analyze the
event logs, use it for troubleshooting purposes and for root cause analysis, and to generate
reports. For more information, see Viewing logs for Cloudera Lakehouse Optimizer. |
| Monitoring policy jobs |
|
Monitor the policy jobs using one of the following methods:
For more information, see Viewing table maintenance status and Monitoring table maintenance tasks on Cloudera Observability dashboard. |
| Backup policies and association | REST API | Backs up all the existing policies and associations to a TAR file. You can restore it
to another Data Hub, when required. You can use this feature when you want to delete the current Cloudera Lakehouse Optimizer Data Hub and provision another Data Hub. For more information, see Cloudera Lakehouse Optimizer REST APIs. |
- Viewing the metrics such as data read and data written for each task on the Cloudera Consumption dashboard.
- Viewing the real-time analysis of the infrastructure, jobs, users, and services for the Data Hub hosting the Cloudera Lakehouse Optimizer service in Cloudera Management Console.
