Creating virtual clusters

In Cloudera Data Engineering, a virtual cluster is an individual auto-scaling cluster with defined CPU and memory ranges. Jobs are associated with virtual clusters, and virtual clusters are associated with an environment. You can create as many virtual clusters as you need.

To create a virtual cluster, you must have an environment with Cloudera Data Engineering enabled.

In the Cloudera console, click the Data Engineering tile. The Cloudera Data Engineering Home page displays.
Click Administration in the left navigation menu, select the environment you want to create a virtual cluster in.
In the Virtual Clusters column, click at the top right to create a new virtual cluster.
If the environment has no virtual clusters associated with it, the page displays a Create a Virtual Cluster button that launches the same wizard.
Enter a Cluster Name.
Cluster names must include the following:
- Begin with a letter
- Be between 3 and 30 characters (inclusive)
- Contain only alphanumeric characters and hyphens
Select the Cloudera Data Engineering Service to create the virtual cluster in.
The environment you selected before launching the wizard is selected by default, but you can use the wizard to create a virtual cluster in a different Cloudera Data Engineering service.
Select one of the following Cloudera Data Engineering cluster types:
- Core (Tier 1): Batch-based transformation and engineering options include:
  - Autoscaling Cluster
  - Cloudera Shared Data Experience/Lakehouse
  - Job Lifecycle
  - Monitoring
  - Workflow Orchestration
- All Purpose (Tier 2) - Develop using interactive sessions and deploy both batch and streaming workloads. This option includes all options in Tier 1 with the following:
  - Shell Sessions - CLI and Web
  - JDBC/SparkSQL (Coming soon)
  - IDE (Coming Soon)
In Capacity, specify the guaranteed and the maximum number of CPU cores, GPU cores, and Memory in gigabytes to configure elastic quota. The cluster can utilize resources up to the maximum set capacity to run the submitted Spark applications.
You can get a minimum guaranteed and maximum capacity of resources (CPU and memory) using guaranteed quota and maximum quota. The guaranteed quota dictates the minimum amount of resources available for allocation for a VC at all times. The resources above the guaranteed quota and within the VC’s maximum quota can be used by any VC on demand if the cluster capacity allows for it.

GPU (Technical Preview): You can set the guaranteed and maximum GPU resource quota for this virtual cluster for Spark 3 jobs to use.
Elastic quotas allow the VC to acquire unused capacity in the cluster when their guaranteed quota limit gets exhausted. This ensures efficient use of resources in the cluster. At the same time, the maximum quota limits the threshold amount of resources a VC can claim in the cluster at any given time.
For information about configuring resource pool and capacity, see Managing cluster resources using Quota Management.
Select the Spark Version to use in the virtual cluster.
Optional: Under Retention (Preview), click Enable Job Run and Log Retention Policy to configure the job run and log retention policy. The retention policy lets you specify how long to retain the job runs and logs, after which it will be deleted to save storage costs and improve performance. By default, in Cloudera Data Engineering there is no expiration period and both job runs and logs are retained forever. Provide the following to configure the duration:
1. Enter Value: Enter a whole number greater than zero to set the duration. Ensure there are no decimals or other characters.
2. Select Period: Select Hours, Days, or Weeks from the drop-down list to set the period of time for which job runs and logs are to be retained.
  important
  When you edit the log retention policy configuration, you must restart the runtime-api-server pod using the kubectl rollout restart deployment/<deployment-name> -n <namespace> command to apply the changes. The namespace is the VC ID found on the virtual cluster details page.
  For example:
  kubectl rollout restart deployment/dex-app-fww6lrgm-api -n dex-app-fww6lrgm
Optional: Click Configure Email Alerting if you want to receive notification mails. Provide the following email configuration information:
note
To receive the email alerts, the Configure Email Alerting option must be configured while creating the virtual cluster. This feature cannot be enabled or disabled by editing the Virtual Cluster details.
- Sender email address.
- Your mail server hostname and port.
- The username and password of the email user who will be logged into the mail server as the sender of the alert emails.
- Select a secure connection method to be used when communicating with the SMTP server.
- Click Test SMTP Configs to test the configurations set for SMTP. This helps you to test the SMTP configuration before creating the cluster.
Click Create.

On the Cloudera Data Engineering Home page, select the environment to view the virtual cluster initialization status. You can also click the three-dot menu for the virtual cluster to view the logs.

You must initialize each virtual cluster you create and configure users before creating jobs.

Cloudera Data Engineering provides a suite of example jobs with a combination of Spark and Airflow jobs, which include scenarios such as reading and writing from object storage, running an Airflow DAG, and expanding on Python capabilities with custom virtual environments. For information about running example jobs, see Cloudera Data Engineering example jobs and sample data.

Creating virtual clusters

We want your opinion

How can we improve this page?