Before you can use the Cloudera Data Engineering (CDE) service, you
must enable it on each environment that you want to use CDE on.
Make sure that you have a working environment for which you want to
enable the CDE service. For more information about environments, see
Environments.
-
Navigate to the Cloudera Data Engineering
Overview page by clicking the Data
Engineering tile in the Cloudera Data Platform (CDP)
management console.
- In the Environments column, click the
plus icon at the top or the Enable new CDE link
at the bottom to enable CDE for an environment.
- Start typing the name of the environment that you want to
enable CDE for. The displayed list dynamically updates to show
environment names matching your input. When you see the correct
environment, click on it to select it.
- Select the Workload Type.
The
workload type corresponds to the instance size that will be deployed
to run your submitted Spark jobs. When you select a type, the
corresponding cloud provider instance size is displayed in the
Summary section to the right.
- If you want to use SSD storage, check the box labeled
Use SSD instances. In this configuration,
SSD storage is used for the workload filesystem, such as the Spark
local directory. If your workload requires more space than is
available in the instance storage, select a larger instance type
with sufficient local storage or select an instance type without
SSD, and then configure the EBS volume size.
- Set the Auto-Scale Range.
The
range you set here creates an
auto scaling group with the
specified minimum and maximum number of instances that can be used.
The CDE service launches and shuts down instances as needed within
this range. The instance size is determined by the
Workload
Type you selected.
- If you want to use spot instances, check the box labeled
Use Spot instances and select a range of spot
instances to request. This creates another auto scaling group of spot
instances. Spot instances are requested with similar CPU and memory
profiles as the instances selected for the Workload
Type. For more information, see Cloudera Data Engineering Spot
Instances.
- If you want to create a load balancing endpoint in a public
subnet, check the box labeled Enable Public
Endpoint. If you leave this unchecked, the load
balancing endpoint will be created in a private subnet, and you will
need to configure access manually in your cloud account.
- Check the box labeled Enable Workload
Analytics to automatically send diagnostic information
from job execution to Cloudera Workload
Manager.
- Optionally add Tags and
Whitelist IPs as needed. Tags are applied to
the cloud provider resources associated with the CDE service
(including virtual clusters created in that service). For more
information about tags, see the cloud provider documentation:
- Click Create.
The CDE Overview page displays the status of
the environment initialization. You can view logs for the environment by
clicking on the environment vertical ellipsis menu, and then clicking
View Logs.