Preparing to create Cloudera Lakehouse Optimizer policies in UI

Before you create the Cloudera Lakehouse Optimizer policies, the administrator must complete a few prerequisites. The administrator must perform the necessary steps to install the Cloudera Lakehouse Optimizer Data Hub and complete the other required configuration steps to initiate Iceberg table maintenance. You can use the Lakehouse Optimizer UI or REST APIs to create and maintain the policies.

Consider the best practices before you create or define the policy.
The following section explains the steps to complete before you create Cloudera Lakehouse Optimizer policies using the Lakehouse Optimizer UI.
  1. Ensure that you do not have any other optimization services enabled alongside Cloudera Lakehouse Optimizer, such as AWS S3 Table Optimization, to avoid conflicts during the optimization process.
  2. The CDP Environment administrator must perform the following steps to install and configure Cloudera Lakehouse Optimizer:
    1. Contact your Cloudera account team to enable the Cloudera Lakehouse Optimizer service for your Cloudera Open Data Lakehouse environment.
    2. Ensure that the environment has the following minimum configuration to support Cloudera Lakehouse Optimizer:
      • The AWS environment must have 1x m5.4xLarge (Master Node); 2x r5d.xLarge (Worker Nodes); r5d.xLarge (Compute Nodes - 0 by default).
      • The Azure environment must have 1x Standard_D16d_v5 (Master Node)- 2x Standard_E8ds_v5 (Worker Nodes)- 0x Standard_E8ds_v5 (Compute Nodes - 0 by default).
    3. Provision only one Cloudera Lakehouse Optimizer Data Hub for your AWS or Azure environment. For instructions, see Provisioning the Lakehouse Optimizer Data Hub.
    4. Assign roles to Cloudera Lakehouse Optimizer users. For instructions, see Configuring roles for Lakehouse Optimizer users.
    5. Modify the default values for Spark Executor Memory and Spark Driver Memory. The default values are 8 GB and 4 GB respectively. The default memory settings might be enough for a majority of the use cases. However, for heavy workloads you might want to increase these values.
      To modify the spark.driver.memory and spark.executor.memory settings, go to the Cloudera Manager > Clusters > cloudera_lakehouse_optimizer > Configuration > conf/dlm-client.properties_role_safety_valve property.
  3. Cloudera Lakehouse Optimizer administrators must perform the following actions before creating the policies:
    1. Verify whether you can access the Lakehouse Optimizer page in Cloudera Management Console.
    2. Verify whether the ClouderaAdaptive default policy is available on the Cloudera Management Console > Lakehouse Optimizer > Policies tab.
Create an event-based policy or a schedule-based policy to initiate Iceberg table maintenance. For more information, see Creating a Lakehouse Optimizer policy.