Provisioning Cloudera Lakehouse Optimizer Data Hub

You can use CDP CLI or Cloudera Management Console to create the Cloudera Lakehouse Optimizer Data Hub.

Consider the following guidelines before you provision the Cloudera Lakehouse Optimizer Data Hub:
  1. Ensure that the environment has the following minimum configuration to support Cloudera Lakehouse Optimizer:
    • The AWS environment must have 1x m5.4xLarge (Master Node); 2x r5d.xLarge (Worker Nodes); r5d.xLarge (Compute Nodes - 0 by default).
    • The Azure environment must have 1x Standard_D16d_v5 (Master Node)- 2x Standard_E8ds_v5 (Worker Nodes)- 0x Standard_E8ds_v5 (Compute Nodes - 0 by default).

    Using the recommended minimum cluster configuration requirements for the Data Hub ensures that the cluster is fully functional and stable to meet the Cloudera Lakehouse Optimizer requirements.

  2. Click the following tabs to view the methods available to you to create a Cloudera Lakehouse Optimizer Data Hub:
    CDP CLI: You can use the following sample CDP CLI command to create the Data Hub in an AWS environment:
    cdp datahub create-aws-cluster \
    --profile default \
    --environment-name [***ENVIRONMENT NAME***] \
    --cluster-name [***DATA HUB NAME***] \
    --cluster-template-name "7.3.1 - Lakehouse Optimizer" \
    --instance-groups nodeCount=0,instanceGroupName=compute,instanceGroupType=CORE,instanceType=r5d.xlarge,rootVolumeSize=200,attachedVolumeConfiguration=\[\{volumeSize=150,volumeCount=1,volumeType=ephemeral\}\],recoveryMode=MANUAL,volumeEncryption=\{enableEncryption=true\} nodeCount=0,instanceGroupName=gateway,instanceGroupType=CORE,instanceType=m5.2xlarge,rootVolumeSize=200,attachedVolumeConfiguration=\[\{volumeSize=100,volumeCount=1,volumeType=gp3\}\],recoveryMode=MANUAL,volumeEncryption=\{enableEncryption=true\} nodeCount=1,instanceGroupName=master,instanceGroupType=GATEWAY,instanceType=m5.4xlarge,rootVolumeSize=200,attachedVolumeConfiguration=\[\{volumeSize=100,volumeCount=1,volumeType=gp3\}\],recoveryMode=MANUAL,volumeEncryption=\{enableEncryption=true\} nodeCount=2,instanceGroupName=worker,instanceGroupType=CORE,instanceType=r5d.xlarge,rootVolumeSize=200,attachedVolumeConfiguration=\[\{volumeSize=100,volumeCount=1,volumeType=gp3\}\],recoveryMode=MANUAL,volumeEncryption=\{enableEncryption=true\} \
    --cloudera-runtime-version=7.3.1-1.cdh7.3.1.p500.gbn \
    --cloudera-manager-version=7.13.1.500-gbn \
    --os-version-for-gbn redhat8
    You can use the following sample CDP CLI command to create the Data Hub in an Azure environment:
    cdp datahub create-azure-cluster \
       --profile default \
       --environment-name [***ENVIRONMENT NAME***]\
       --cluster-name [***DATA HUB NAME***] \
       --cluster-template-name "CLO-731-1" \
       --instance-groups nodeCount=1,instanceGroupName=master,instanceGroupType=GATEWAY,instanceType=Standard_D16s_v5,rootVolumeSize=300,attachedVolumeConfiguration=\[\{volumeSize=100,volumeCount=1,volumeType=StandardSSD_LRS\}\],recoveryMode=MANUAL nodeCount=2,instanceGroupName=worker,instanceGroupType=CORE,instanceType=Standard_E8ds_v5,rootVolumeSize=200,attachedVolumeConfiguration=\[\{volumeSize=100,volumeCount=1,volumeType=StandardSSD_LRS\}\],recoveryMode=MANUAL nodeCount=0,instanceGroupName=compute,instanceGroupType=CORE,instanceType=Standard_E8ds_v5,rootVolumeSize=200,attachedVolumeConfiguration=\[\{volumeSize=100,volumeCount=1,volumeType=StandardSSD_LRS\}\],recoveryMode=MANUAL nodeCount=0,instanceGroupName=gateway,instanceGroupType=CORE,instanceType=Standard_D8ds_v5,rootVolumeSize=200,attachedVolumeConfiguration=\[\{volumeSize=100,volumeCount=1,volumeType=StandardSSD_LRS\}\],recoveryMode=MANUAL \
       --cloudera-runtime-version=7.3.1-1.cdh7.3.1.p500.gbn \
       --cloudera-manager-version=7.13.1.500-gbn \
       --os-version-for-gbn redhat8
      
    Cloudera Management Console
    1. Ensure that the required AWS environment or Azure environment is available and healthy. For more information about registering an AWS environment or an Azure environment, see Register an AWS environment or Register an Azure environment.
    2. Go to the Cloudera Management Console > Data Hub Clusters page.
    3. Click Create Data Hub as shown in the following screenshot:
      The image shows the Create Data Hub option on the Data Hub Clusters page in Cloudera Management Console.
    4. Perform the following steps on the Provision Data Hub page:
      1. Choose the required environment from the Selected Environment with running Data Lake list.
      2. Choose 7.3.1 - Lakehouse Optimizer for AWS or 7.3.1 - Lakehouse Optimizer for Azure from the Cluster Definition list in the Services section.
    5. Enter a unique Cluster Name in the General Settings section.

      The cluster name must be at least five characters long. It must start with a lowercase letter, end with an alphanumeric character, and must have only lowercase alphanumeric characters and hyphens.

    6. Click Provision Cluster as shown in the following screenshot:
      The image shows the Provision Cluster option when creating a Data Hub.

      The Data Hub appears on the Data Hub Clusters page. The Data Hub name is the same as the cluster name.

    The Data Hub appears on the Data Hub Clusters page. The Data Hub name is the same as the cluster name.
  3. After you create the Data Hub, enable autoscaling for the compute node in Management Console. For more information, see Configuring autoscaling.
Go to the Lakehouse Optimizer page to verify whether the environment is accessible or not.