Creating virtual clusters

In Cloudera Data Engineering (CDE), a virtual cluster is an individual auto-scaling cluster with defined CPU and memory ranges. Jobs are associated with virtual clusters, and virtual clusters are associated with an environment. You can create as many virtual clusters as you need.

To create a virtual cluster, you must have an environment with Cloudera Data Engineering (CDE) enabled.

  1. In the Cloudera Data Platform (CDP) console, click the Data Engineering tile. The CDE Home page displays.
  2. Click Administration in the left navigation menu, select the environment you want to create a virtual cluster in.
  3. In the Virtual Clusters column, click at the top right to create a new virtual cluster.
    If the environment has no virtual clusters associated with it, the page displays a Create DE Cluster button that launches the same wizard.
  4. Enter a Cluster Name.
    Cluster names must include the following:
    • Begin with a letter
    • Be between 3 and 30 characters (inclusive)
    • Contain only alphanumeric characters and hyphens
  5. Select the CDE Service to create the virtual cluster in.
    The environment you selected before launching the wizard is selected by default, but you can use the wizard to create a virtual cluster in a different CDE service.
  6. In Capacity (Technical Preview), use the slider to set the maximum number of CPU cores and the maximum memory in gigabytes. The cluster can utilize resources upto the set capacity to run the submitted Spark applications.
    For information about configuring resource pool and capacity, see Managing cluster resources using Quota Management (Technical Preview).
  7. Select the Spark Version to use in the virtual cluster.
  8. Optional: Click Configure Email Alerting if you want to receive notification mails. Provide the following email configuration information:
    • Sender email address.
    • Your mail server hostname and port.
    • The username and password of the email user who will be logged into the mail server as the sender of the alert emails.
    • Select a secure connection method to be used when communicating with the SMTP server.
    • Click Test SMTP Configs to test the configurations set for SMTP. This helps you to test the SMTP configuration before creating the cluster.
  9. Click Create.
On the CDE Home page, select the environment to view the virtual cluster initialization status. You can also click the three-dot menu for the virtual cluster to view the logs.

You must initialize each virtual cluster you create and configure users before creating jobs.

Cloudera Data Engineering provides a suite of example jobs with a combination of Spark and Airflow jobs, which include scenarios such as reading and writing from object storage, running an Airflow DAG, and expanding on Python capabilities with custom virtual environments. For information about running example jobs, see CDE example jobs and sample data.