Before you create the Cloudera Lakehouse Optimizer policies, the administrator
must complete a few prerequisites. The administrator must perform the necessary steps to
install the Cloudera Lakehouse Optimizer Data Hub and complete the other required
configuration steps to initiate Iceberg table maintenance. You can use the Lakehouse
Optimizer UI or REST APIs to create and maintain the policies.
The following section explains the steps to complete before you create Cloudera Lakehouse Optimizer policies using the Lakehouse Optimizer UI.
-
Ensure that you do not have any other optimization services enabled alongside
Cloudera Lakehouse Optimizer, such as AWS S3 Table Optimization, to
avoid conflicts during the optimization process.
-
The CDP Environment administrator must perform the
following steps to install and configure Cloudera Lakehouse Optimizer:
-
Contact your Cloudera account team to enable the Cloudera Lakehouse Optimizer service for your Cloudera Open Data
Lakehouse environment.
-
Ensure that the environment has the following minimum configuration to
support Cloudera Lakehouse Optimizer:
- The AWS environment must have 1x m5.4xLarge (Master
Node); 2x r5d.xLarge (Worker Nodes); r5d.xLarge (Compute Nodes -
0 by default).
- The Azure environment must have 1x Standard_D16d_v5
(Master Node)- 2x Standard_E8ds_v5 (Worker Nodes)- 0x
Standard_E8ds_v5 (Compute Nodes - 0 by default).
-
Provision only one Cloudera Lakehouse Optimizer Data Hub for
your AWS or Azure environment. For instructions, see Provisioning the Lakehouse
Optimizer Data Hub.
-
Assign roles to Cloudera Lakehouse Optimizer users. For
instructions, see Configuring roles
for Lakehouse Optimizer users.
-
Modify the default values for Spark Executor Memory and Spark Driver
Memory. The default values are 8 GB and 4 GB respectively. The default
memory settings might be enough for a majority of the use cases.
However, for heavy workloads you might want to increase these
values.
To modify the spark.driver.memory and
spark.executor.memory settings, go to the property.
-
Cloudera Lakehouse Optimizer administrators must perform the following
actions before creating the policies:
-
Verify whether you can access the Lakehouse Optimizer page in
Cloudera Management Console.
-
Verify whether the
ClouderaAdaptive default policy is
available on the tab.
Create an event-based policy or a schedule-based policy to initiate Iceberg table
maintenance. For more information, see Creating a Lakehouse Optimizer policy.