Enabling Cloudera Data Engineering

Before you can use the Cloudera Data Engineering (CDE) service, you must enable it on each environment that you want to use CDE on.

Make sure that you have a working environment for which you want to enable the CDE service. For more information about environments, see Environments.

  1. Navigate to the Cloudera Data Engineering Overview page by clicking the Data Engineering tile in the Cloudera Data Platform (CDP) management console.
  2. In the Environments column, click the plus icon at the top or the Enable new CDE link at the bottom to enable CDE for an environment.
  3. Start typing the name of the environment that you want to enable CDE for. The displayed list dynamically updates to show environment names matching your input. When you see the correct environment, click on it to select it.
  4. Select the Workload Type.
    The workload type corresponds to the instance size that will be deployed to run your submitted Spark jobs. When you select a type, the corresponding cloud provider instance size is displayed in the Summary section to the right.
    1. If you want to use SSD storage, check the box labeled Use SSD instances. In this configuration, SSD storage is used for the workload filesystem, such as the Spark local directory. If your workload requires more space than is available in the instance storage, select a larger instance type with sufficient local storage or select an instance type without SSD, and then configure the EBS volume size.
  5. Set the Auto-Scale Range.
    The range you set here creates an auto scaling group with the specified minimum and maximum number of instances that can be used. The CDE service launches and shuts down instances as needed within this range. The instance size is determined by the Workload Type you selected.
  6. If you want to use spot instances, check the box labeled Use Spot instances and select a range of spot instances to request. This creates another auto scaling group of spot instances. Spot instances are requested with similar CPU and memory profiles as the instances selected for the Workload Type. For more information, see Cloudera Data Engineering Spot Instances.
  7. On Microsoft Azure you may create a fully private AKS cluster by selecting Enable Private Endpoint.
  8. If you create the service in an AWS environment using a non-transparent proxy, you find a Proxy CIDR Source Ranges field. You are only required to enter the proxy CIDR ranges for the proxy instances if you registered your proxies using host names, as the Cloudera Control Plane has no way to resolve those to IPs. If your proxy instances were registered using IP addresses, you can leave this field blank.
    For example, if you have a load balanced proxy with static IPs running on 10.80.199.105 and 10.80.200.45, add "10.80.199.105/32" and "10.80.200.45/32". If your proxy instances are dynamic (behind a load balancer or virtual IP) then you would enter a wider range, e.g.: "10.80.200.0/24".
  9. If you want to create a load balancing endpoint in a public subnet, check the box labeled Enable Public Loadbalancer. If you leave this unchecked, the load balancing endpoint will be created in a private subnet, and you will need to configure access manually in your cloud account.
  10. Specify a comma-separated list of CIDRs in API server Authorized IP Ranges that can access the Kubernetes master API server.
    You may specify a comma-separated list of CIDRs that can access the Kubernetes master API server.
    Make sure that the provided IP addresses do not overlap with the following ranges:
    • 0.0.0.0 - 0.255.255.255
    • 10.0.0.0 - 10.255.255.255
    • 100.64.0.0 - 100.127.255.255
    • 127.0.0.0 - 127.255.255.255
    • 169.254.0.0 - 169.254.255.255s
    • 172.16.0.0 - 172.31.255.255
    • 192.0.0.0 - 192.0.0.255
    • 192.0.2.0 - 192.0.2.255
    • 192.88.99.0 - 192.88.99.255
    • 192.168.0.0 - 192.168.255.255
    • 198.18.0.0 - 198.19.255.255
    • 198.51.100.0 - 198.51.100.255
    • 203.0.113.0 - 203.0.113.255
    • 224.0.0.0 - 239.255.255.255
    • 240.0.0.0 - 255.255.255.254
    • 255.255.255.255
  11. Specify a comma-separated list of client IP ranges in Load Balancer Source Ranges that should be allowed to access the load balancer.
  12. Specify which subnets to use for the Kubernetes worker nodes. Select from available Subnets in the drop-down list.
  13. Check the box labeled Enable Workload Analytics to automatically send diagnostic information from job execution to Cloudera Workload Manager.
  14. Optionally add Tags as needed. Tags are applied to the cloud provider resources associated with the CDE service (including virtual clusters created in that service). For more information about tags, see the cloud provider documentation:
  15. Click Enable.
The CDE Overview page displays the status of the environment initialization. You can view logs for the environment by clicking on the environment vertical ellipsis menu, and then clicking View Logs.