Creating virtual clusters

In Cloudera Data Engineering, a virtual cluster is an individual auto-scaling cluster with defined CPU and memory ranges. Jobs are associated with virtual clusters, and virtual clusters are associated with an environment. You can create as many virtual clusters as you need. See Recommendations for scaling Cloudera Data Engineering ceployments linked below.

note

Virtual Cluster installation fails when Ozone S3 gateway proxy is enabled. Ozone S3 gateway proxy gets enabled when more than one Ozone S3 Gateway is configured in the Cloudera Base on premises cluster.

As a workaround, add the 127.0.0.1 s3proxy-<environment-name>.<private-cloud-control-plane-name>-services.svc.cluster.local entry in the /etc/hosts file of all nodes in the Cloudera Base on premises cluster where the Ozone S3 gateway is installed. For example, if the private cloud environment name is cdp-env-1 and private cloud control plane name is cdp, then add the following entry in the /etc/hosts file:

127.0.0.1 s3proxy-cdp-env-1.cdp-services.svc.cluster.local

To create a virtual cluster, you must have an environment with Cloudera Data Engineering enabled.

In the Cloudera console, click the Data Engineering tile. The Cloudera Data Engineering Home page displays.
Click Administration in the left navigation menu, select the environment you want to create a virtual cluster in.
In the Virtual Clusters column, click at the top right to create a new virtual cluster.
If the environment has no virtual clusters associated with it, the page displays a Create DE Cluster button that launches the same wizard.
Enter a Cluster Name.
Cluster names must include the following:
- Begin with a letter
- Be between 3 and 30 characters (inclusive)
- Contain only alphanumeric characters and hyphens
Select the Service to create the virtual cluster in.
The environment you selected before launching the wizard is selected by default, but you can use the wizard to create a virtual cluster in a different Cloudera Data Engineering service.
Select one of the following Cloudera Data Engineering cluster types:
note
These tiers are available in Cloudera Data Engineering 1.19 and above only.
- Core (Tier 1)
- All Purpose (Tier 2)
important
For both Microsoft Azure and Amazon Web Services, to avoid being charged for All purpose compute nodes, even if you do not create an All purpose Virtual Cluster:
Navigate to the Service Details page > Configurations tab > Capacity & Costs > Autoscale Range and set the minimum value of All purpose On-demand Instances to 0.
For more information, see Cloudera Data Engineering cluster types.
Select the Spark Version to use in the virtual cluster. You cannot use Spark 2 and Spark 3 in the same virtual cluster, but you can have separate Spark 2 and Spark 3 virtual clusters within the same Cloudera Data Engineering service. While you can have virtual clusters with different Spark 3.x versions, a single virtual cluster can only use one Spark version.

On the Cloudera Data Engineering UI, you can view the supported Apache Airflow runtime component version and Spark runtime component version on the VC creation and details pages.

The following screenshot illustrates creating a new Virtual Cluster, where you can select the required runtime component versions from a drop-down menu on the UI.

important
If you switch from the Redhat UBI image to the security hardened image, note that the updated component versions and libraries can impact the Cloudera Data Engineering operation. Cloudera strongly recommends testing the new implementation in a non-production environment first.
For backward compatibility, you can continue using the Redhat UBI image. The security hardened image is the recommended image, however, note that it requires testing and validation.
For more information, see Security hardened Spark image migration guide.

For more information, see Compatibility for Cloudera Data Engineering and Runtime components.
Use the Auto-Scale Max Capacity sliders to set the maximum number of CPU cores and the maximum memory in gigabytes. The cluster will scale up and down as needed to run the submitted Spark applications.
Optional for spot instances enabled at the Cloudera Data Engineering service level: From the Driver and Executors will run on drop-down menu, select whether you want to run drivers and executors on spot instances or on-demand instances. By default, the driver runs on on-demand instances, and the executors run on spot instances. For SLA-bound workloads, select On-demand. For non-SLA workloads, Cloudera recommends leaving the default configuration to take advantage of the cost savings afforded by spot instances. For more information, see Cloudera Data Engineering Spot Instances.
Optional: Select Restrict Access to add access control for the virtual cluster. You can search for users to add by name or email address. You can manage users using the Cloudera Management Console. For more information, see Managing user access and authorization.
Optional: Click Configure Email Alerting (Technical Preview) if you want to receive notification mails.The email configuration options appear.
1. You must provide at least Sender Email Address and SMTP Host information.
2. Test SMTP Configs: Click Test SMTP Configs to test the configurations set for SMTP. This helps you to test the SMTP configuration before creating the cluster.
note
You will have the option to enable email alerts when you create or edit a job after the Virtual Cluster is created, but step 13 must be completed first.
Click Create.

On the Cloudera Data Engineering Home page, select the environment to view the virtual cluster initialization status. You can also click the three-dot menu for the virtual cluster to view the logs.