Enabling Cloudera Data Engineering
Before you can use the Cloudera Data Engineering (CDE) service, you must enable it on each environment that you want to use CDE on.
Make sure that you have a working environment for which you want to enable the CDE service. For more information about environments, see Environments.
- Navigate to the Cloudera Data Engineering Overview page by clicking the Data Engineering tile in the Cloudera Data Platform (CDP) management console.
- In the Environments column, click the plus icon at the top or the Enable new CDE link at the bottom to enable CDE for an environment.
- Start typing the name of the environment that you want to enable CDE for. The displayed list dynamically updates to show environment names matching your input. When you see the correct environment, click on it to select it.
- Select the Workload Type.The workload type corresponds to the instance size that will be deployed to run your submitted Spark jobs. When you select a type, the corresponding cloud provider instance size is displayed in the Summary section to the right.
- If you want to use SSD storage, check the box labeled Use SSD instances. In this configuration, SSD storage is used for the workload filesystem, such as the Spark local directory. If your workload requires more space than is available in the instance storage, select a larger instance type with sufficient local storage or select an instance type without SSD, and then configure the EBS volume size.
- Set the Auto-Scale Range.The range you set here creates an auto scaling group with the specified minimum and maximum number of instances that can be used. The CDE service launches and shuts down instances as needed within this range. The instance size is determined by the Workload Type you selected.
- If you want to use spot instances, check the box labeled Use Spot instances and select a range of spot instances to request. This creates another auto scaling group of spot instances. Spot instances are requested with similar CPU and memory profiles as the instances selected for the Workload Type. For more information, see Cloudera Data Engineering Spot Instances.
- If you want to create a load balancing endpoint in a public subnet, check the box labeled Enable Public Endpoint. If you leave this unchecked, the load balancing endpoint will be created in a private subnet, and you will need to configure access manually in your cloud account.
- Check the box labeled Enable Workload Analytics to automatically send diagnostic information from job execution to Cloudera Workload Manager.
Specify Whitelist IPs.
You may specify a comma-separated list of CIDRs that can access the Kubernetes master API server.Make sure that the provided IP addresses do not overlap with the following ranges:
- 0.0.0.0 - 0.255.255.255
- 10.0.0.0 - 10.255.255.255
- 100.64.0.0 - 100.127.255.255
- 127.0.0.0 - 127.255.255.255
- 169.254.0.0 - 169.254.255.255
- 172.16.0.0 - 172.31.255.255
- 192.0.0.0 - 184.108.40.206
- 192.0.2.0 - 192.0.2.255
- 220.127.116.11 - 18.104.22.168
- 192.168.0.0 - 192.168.255.255
- 198.18.0.0 - 198.19.255.255
- 198.51.100.0 - 198.51.100.255
- 203.0.113.0 - 203.0.113.255
- 22.214.171.124 - 126.96.36.199
- 240.0.0.0 - 255.255.255.254
- Specify which subnet(s) to use for the Kubernetes worker nodes. Select from available Subnets in the drop-down list.
Optionally add Tags as needed. Tags are applied to the cloud
provider resources associated with the CDE service (including virtual clusters created in
that service). For more information about tags, see the cloud provider
- Amazon AWS
- Tagging AWS resources
- Click Create.