Creating virtual clusters

In Cloudera Data Engineering (CDE), a virtual cluster is an individual auto-scaling cluster with defined CPU and memory ranges. Jobs are associated with virtual clusters, and virtual clusters are associated with an environment. You can create as many virtual clusters as you need.

To create a virtual cluster, you must have an environment with Cloudera Data Engineering (CDE) enabled.

  1. From the CDE Overview page, select the environment you want to create a virtual cluster in.
  2. In the Virtual Clusters column, click the plus icon at the top right to create a new virtual cluster.
    If the environment has no virtual clusters associated with it, the page displays a Create DE Cluster button that launches the same wizard.
  3. Enter a Cluster Name.
    Cluster names must:
    • Begin with a letter
    • Be between 3 and 30 characters (inclusive)
    • Contain only alphanumeric characters and hyphens
  4. Select the CDE Service to create the virtual cluster in.
    The environment you selected before launching the wizard is selected by default, but you can use the wizard to create a virtual cluster in a different CDE service.
  5. Select the Spark Version to use in the virtual cluster.
  6. Click Create.
On the CDE Overview page, select the environment to view the virtual cluster initialization status. You can also click the three-dot menu for the virtual cluster to view the logs.
For CDE on private cloud, you must perform some additional manual steps. You must do this for each virtual cluster you create.
  1. Download cdp-cde-utils.sh to your local machine.
  2. Create a directory to store the files, and change to that directory:
    mkdir -p /tmp/cde-latest && cd /tmp/cde-latest
  3. Embedded Container Service (ECS)
    Copy the extracted utility script (cdp-cde-utils.sh) to one of the Embedded Container Service (ECS) cluster hosts. To identify the ECS cluster hosts:
    1. Log in to the Cloudera Manager web interface.
    2. Go to Clusters > Experience Cluster > ECS > Hosts.
    3. Select one of the listed hosts, and copy the script to that host.
    Red Hat OpenShift Container Platform (OCP)
    Copy the extracted utility script (cdp-cde-utils.sh) and the OpenShift kubeconfig file to one of the HDFS service gateway hosts, and install the kubectl utility:
    1. Log in to the Cloudera Manager web interface.
    2. Go to Clusters > Base Cluster > HDFS > Instances.
    3. Select one of the Gateway hosts, log in using the security password that was set, and copy the script to that host.
    4. Copy the OCP kubeconfig file to the same host.
    5. Export the OCP kubeconfig file:
       export KUBECONFIG=[***path_of_the_copied_OCP_Kubeconfig_file***]
    6. On that host, install the kubectl utility following the instructions in the Kubernetes documentation. Make sure to install a kubectl version between 1.16 and 1.22 (inclusive). Cloudera recommends installing the version that matches the Kubernetes version installed on the OpenShift cluster.
  4. On the cluster host that you copied the script to, set the script permissions to be executable:
    chmod +x /path/to/cdp-cde-utils.sh
  5. Identify the virtual cluster endpoint:
    1. In the Cloudera Manager web UI, go to the Experiences page, and then click Open CDP Private Cloud Experiences.
    2. Click the Data Engineering tile.
    3. Select the CDE service containing the virtual cluster you want to activate.
    4. Click Cluster Details.
    5. Click JOBS API URL to copy the URL to your clipboard.
    6. Paste the URL into a text editor to identify the endpoint host. For example, the URL is similar to the following:
      http://dfdj6kgx.cde-2cdxw5x5.ecs-demo.example.com/dex/api/v1

      The endpoint host is dfdj6kgx.cde-2cdxw5x5.ecs-demo.example.com.

  6. On the ECS or HDFS gateway host you selected previously, initialize the virtual cluster using the cdp-cde-utils.sh script. You can either generate and use a self-signed certificate, or provide a signed certificate and private key.
    Generate a self-signed certificate
    ./cdp-cde-utils.sh init-virtual-cluster -h <endpoint_host> -a
    For example, using the previous example URL, the endpoint host is dfdj6kgx.cde-2cdxw5x5.ecs-demo.example.com:
    ./cdp-cde-utils.sh init-virtual-cluster -h dfdj6kgx.cde-2cdxw5x5.ecs-demo.example.com -a
    Use a signed certificate and private key
    Make sure that the certificate is a wildcard certificate for the cluster endpoint. For example, *.dfdj6kgx.cde-2cdxw5x5.ecs-demo.example.com
    ./cdp-cde-utils.sh init-virtual-cluster -h <endpoint_host> -c /path/to/cert -k /path/to/keyfile
    For example, using the previous example URL, the endpoint host is dfdj6kgx.cde-2cdxw5x5.ecs-demo.example.com:
    ./cdp-cde-utils.sh init-virtual-cluster -h dfdj6kgx.cde-2cdxw5x5.ecs-demo.example.com -c /tmp/cde-pvc.crt -k /tmp/cde-pvc.key

You must perform this procedure for each virtual cluster you create.