How Cloudera Lakehouse Optimizer performs table maintenance

Cloudera Lakehouse Optimizer and its components perform several operations in the background after you create or define a policy to initiate, evaluate, and complete the Iceberg table maintenance.

After you create a policy, Cloudera Lakehouse Optimizer performs the following steps:
  1. Initiates the schedule for each table based on the policies attached to the resource, such as catalog, namespace, and table.
  2. Consolidates the policies depending on the CRON schedule in all the active policies, and creates a schedule entry for all the policies that share the same CRON trigger.
  3. Initiates the evaluation phase when a CRON trigger is activated. The evaluation phase consists of the following steps:
    1. Initiating the evaluation of the policy script per table.
    2. Determining the sequence of maintenance actions per table where the policy script:
      1. compares the current Iceberg metadata and table statistics to the expected statistics to evaluate and determine the table maintenance actions,
      2. generates a list of (potentially empty) maintenance actions, and then
      3. queues up the generated maintenance actions.

        This list of actions is called a task, and the task has a task ID.

  4. Sends the maintenance actions to the Spark Engine through Livy in the correct order of sequence. The Spark engine performs each maintenance action as a Spark job.
  5. Supervises the task execution, and collects all the relevant task execution information in the events logs.
    Cloudera Lakehouse Optimizer captures every maintenance action, Spark job, and the job results as task metadata. The task metadata includes the following elements:
    • The task ID
    • The maintenance actions and its results
    • The Spark computation metrics for the Spark jobs
    • The current task status

    The task metadata is stored in the sys.task_events table.

    The following diagram shows the high-level steps for Iceberg table maintenance by Cloudera Lakehouse Optimizer:
    The image shows the Tables tab on the Lakehouse Optimize page.