Cloudera Lakehouse Optimizer features

Cloudera Lakehouse Optimizer supports several features.

The following table lists the supported features and the supported methods to use these features:

Table 1. Supported feature list
Features	Available methods to use the feature	Description
Event-based policy	REST API – at table, namespace, and catalog level	Schedules the policies to be evaluated when an HMS event is triggered, such as an insert, update or delete operation on the table. You can create only one version of the policy definition at the catalog level in the UI. However, you can create multiple versions of the policy definition at catalog, namespace, or table level using REST APIs. For example, when you create policy P1 in the UI, the definition is defined at the catalog level. However, using REST APIs you can create another definition for P1 at namespace level or table level.
Schedule-based policy	REST API – at table, namespace, and catalog level	Schedules policies to be evaluated at regular intervals. You can create only one version of the policy definition at the catalog level in UI. However, you can create multiple versions of the policy definition at catalog, namespace, or table level using REST APIs. For more information, see Defining Cloudera Lakehouse Optimizer resources using REST APIs.
Manual (ad hoc) evaluation	REST API	Manually run the policies to optimize the Iceberg tables when required. For more information, see Performing manual Iceberg table maintenance using Cloudera Lakehouse Optimizer REST APIs. note Ensure that you perform a dry run on the policy before you run it manually.
Dry-run policies	REST API	Dry run existing policies to ensure they run effectively without failure. Generates the table maintenance actions but does not initiate any maintenance actions. For more information, see Preparing and defining Cloudera Lakehouse Optimizer policies using REST APIs.
Small file compaction options include: Target file size Minimum number of input files Delete file threshold Maximum concurrent file group rewrites Enable partial progress Maximum number of commits during partial progress Use starting sequence number of snapshot Rewrite all	REST API	Automates the Iceberg data file compaction maintenance actions. In Apache Iceberg documentation, this procedure is called rewrite_data_files, and it supports Table, Strategy (binpack or sort), sort_order (zorder, sortDirection, NullOrder), options, and where arguments which are also supported by Cloudera Lakehouse Optimizer.
Orphan file removal includes: Delete older than	REST API	Automates the Iceberg orphan file removal maintenance action.
Snapshot expiration options include: Maximum snapshot age Retain last Expire snapshot ID Clean expired files	REST API	Automates the Iceberg snapshot management maintenance actions.
Rewrite manifest options include: Target file size Use caching	REST API	Automates the Iceberg manifest rewrite maintenance actions.
Positional delete rewrite options include: Rewrite job order Enable partial progress Maximum number of commits during partial progress Minimum number of input files Maximum concurrent group rewrites Target file size	REST API	Automates the Iceberg positional delete rewrite maintenance actions.
Pause and resume table maintenance manually	REST API	Pauses table maintenance. The table maintenance is paused in the following scenarios: You manually paused the table maintenance. The recurring failures, during the execution phase of the policy, exceeded the retry value. note When you pause maintenance for a table or namespace, all the associated policies for that table or namespace are not evaluated. Therefore, you must be cautious before you pause maintenance for a table or a namespace. For more information, see Pausing and resuming table maintenance.
CLO event logging	REST API	Ingests the maintenance task metadata, also called an event, into the sys.clo_events Iceberg table. You can use the table to analyze the event logs, use it for troubleshooting purposes and for root cause analysis, and to generate reports. For more information, see Viewing logs for Cloudera Lakehouse Optimizer.
Monitoring policy jobs	REST API	Monitor the policy jobs using one of the following methods: View the latest status for the recent tasks that ran for the table or policy on the UI. Use the GET /tasks or GET /tasks/id/{id} APIs. Monitor the Spark jobs on the Cloudera Consumption dashboard. For more information, see Viewing table maintenance status and Monitoring table maintenance tasks on Cloudera Observability dashboard.
Backup policies and association	REST API	Backs up all the existing policies and associations to a TAR file. You can use the backup file to restore these configurations to any other Cloudera Lakehouse Optimizer service instance, when required. This feature is useful when deleting a current Cloudera Lakehouse Optimizer service instance. For more information, see Cloudera Lakehouse Optimizer REST APIs.
Fine-grained access to namespaces	Ranger UI	Creates Ranger policies and provides the required access to groups or users at namespace level.