Creating virtual clusters

In Cloudera Data Engineering, a virtual cluster is an individual auto-scaling cluster with defined CPU and memory ranges. These virtual clusters are linked to a specific Cloudera Data Engineering Service, which is, in turn, associated with an environment. You can create any number of virtual clusters, which allows you to execute jobs and other artifacts.

To create a virtual cluster, you must have an environment with Cloudera Data Engineering enabled.

In the Cloudera console, click the Data Engineering tile. The Cloudera Data Engineering Home page displays.
In the left navigation menu, click Administration, and then select the Cloudera Data Engineering Service for which you want to create a virtual cluster.
In the Virtual Clusters column, click at the top right to create a new virtual cluster.
If the environment has no virtual clusters associated with it, the page displays a Create a Virtual Cluster button that launches the same wizard.
Enter a Cluster Name.
Cluster names must include the following:
- Begin with a letter
- Be between 3 and 30 characters (inclusive)
- Contain only alphanumeric characters and hyphens
Select the Cloudera Data Engineering Service to create the virtual cluster in.
The environment you selected before launching the wizard is selected by default, but you can use the wizard to create a virtual cluster in a different Cloudera Data Engineering service.
Select the Spark Version to use in the virtual cluster. You cannot use Spark 2 and Spark 3 in the same virtual cluster, but you can have separate Spark 2 and Spark 3 virtual clusters within the same Cloudera Data Engineering service. While you can have virtual clusters with different Spark 3.x versions, a single virtual cluster can only use one Spark version.

On the Cloudera Data Engineering UI, you can view the supported Apache Airflow runtime component version and Spark runtime component version on the VC creation and details pages.

The following screenshot illustrates creating a new Virtual Cluster, where you can select the required runtime component versions from a drop-down menu on the UI.

important
If you switch from the Redhat UBI image to the security hardened image, note that the updated component versions and libraries can impact the Cloudera Data Engineering operation. Cloudera strongly recommends testing the new implementation in a non-production environment first.
For backward compatibility, you can continue using the Redhat UBI image. The security hardened image is the recommended image; however, note that it requires testing and validation.
For more information, see Security hardened Spark image migration guide.

For more information, see Compatibility for Cloudera Data Engineering and Cloudera Runtime components.
Select one of the following Cloudera Data Engineering cluster types:
- Core (Tier 1): Batch-based transformation and engineering options include:
  - Autoscaling Cluster
  - Cloudera Shared Data Experience/Lakehouse
  - Job Lifecycle management
  - Monitoring
  - Workflow Orchestration
- All Purpose (Tier 2) - Develop using interactive sessions and deploy both batch and streaming workloads. This option includes all options in Tier 1 with the following:
  - Shell Sessions - CLI and Web
  - JDBC/SparkSQL (Coming soon)
  - Integrated Development Environment (IDE) (Coming Soon)
In Rescource Capacity, specify the guaranteed and the maximum number of CPU cores, GPU cores, and Memory in gigabytes to configure elastic quota. The cluster can utilize resources up to the maximum set capacity to run the submitted Spark applications.
You can get a minimum guaranteed and maximum capacity of resources (CPU and memory) using guaranteed quota and maximum quota. The guaranteed quota dictates the minimum amount of resources available for allocation for a Virtual Cluater at all times. The resources above the guaranteed quota and within the Virtual Cluster’s maximum quota can be used by any Virtual Cluster on demand if the cluster capacity allows for it.
important
Currently, the Cloudera Data Engineering UI says that the Cloudera Data Engineering infrastructure requires a minimum of 9 CPU cores and 28 GB Memory for creating a Virtual Cluster. But, the actual minimum requirement for creating a Virtual Cluster are 12 CPU cores and 32 GB Memory. Make sure that you update the CPU (Cores) > Guaranteed to 12 and Memory (GiB) > Guaranteed to 32.
GPU (Technical Preview): You can set the guaranteed and maximum GPU resource quota for this virtual cluster for Spark 3 jobs to use.
Elastic quotas allow the Virtual Cluster to acquire unused capacity in the cluster when their guaranteed quota limit gets exhausted. This ensures efficient use of resources in the cluster. At the same time, the maximum quota limits the threshold amount of resources a Virtual Cluster can claim in the cluster at any given time.
For information about configuring resource pool and capacity, see Managing cluster resources using Quota Management.
Under Privacy Settings , select the Restrict sharing by default check box to enable the privacy settings for artifact sharing in the Virtual Cluster. For more information about Privacy Settings, see Privacy Settings and about artifact sharing, see Artifact access management.
Click Create Virtual Cluster. This process takes 20 minutes approximately. Monitor the progress by checking the logs and refreshing them every five minutes. You can view logs for the virtual cluster by clicking on the vertical ellipsis (three dots) menu, and then clicking View Logs.

The Cloudera Data Engineering virtual cluster is created and all the pods in the dex-app-XXXX virtual cluster are running.

For more information about the virtual cluster, navigate to Cloudera Data Engineering > Administration. Click on the Cloudera Data Engineering Service in which the virtual cluster is created and click the icon for the respective Virtual Cluster.

By default, Cloudera Data Engineering automatically sets up the virtual cluster with a self-signed certificate. To replace this with an updated certificate, see Updating the Control Plane certificates in Cloudera Data Engineering virtual clusters.
To allow users or groups to access the virtual cluster , the required roles must be assigned. For more information, see User Access Management.
(Optional) To configure email alerts for notifications, do the following steps:
1. In the Cloudera console, click the Data Engineering tile. The Cloudera Data Engineering Home page displays.
2. Click Administration in the left navigation menu. The Administration page displays.
3. In the Services column, select the environment containing the virtual cluster for which you want to configure the email alerts.
4. In the Virtual Clusters column on the right, click the icon for the virtual cluster for which you want to configure the email alerts..
5. On the Configuration tab, select Enable Email Alerts check box and provide the following email configuration information:
  - Sender Email Address: Enter the email address to which you want to receive the alerts.
  - SMTP Username: The username of the email user who will be logged into the mail server as the sender of the alert emails.
  - SMTP Password: The password of the email user who will be logged into the mail server as the sender of the alert emails.
  - SMTP Host: Your mail server hostname.
  - Email Encryption: Select a secure connection method to be used when communicating with the SMTP server.
  - SMTP Port: Your mail server port number.
6. Click Test SMTP Configs to test the configurations set for SMTP. This allows you to test the SMTP configuration before creating the cluster.
(Optional) After successfully completing all the steps, you can load and run example jobs to verify the proper functioning of the system. For more information, see Cloudera Data Engineering example jobs and sample data.