Cloudera Data Engineering Spot Instances

Cloudera Data Engineering (CDE) supports spot instances to facilitate cloud cost savings for workloads that are not SLA-bound. If you have enabled spot instances for your CDE service, when you create a virtual cluster, you can specify whether drivers and executors run on spot instances or on-demand instances. The default setting uses on-demand instances for drivers, and spot instances for executors.

Spot instances are discounted instances that can be reclaimed at any time by the cloud provider. For fault-tolerant workloads such as Apache Spark, utilizing spot instances can provide significant cost savings. Because spot instances can disappear, impacting job performance, Cloudera recommends only using them for workloads without strict SLA requirements.

For development, testing, and other workloads that have more flexible requirements, Cloudera recommends using on-demand instances for drivers, and spot instances for executors. This reduces the impact of losing a spot instance, because executor failure recovery is more efficient than recovery from a driver failure.

To use spot instances, you must enable them on the CDE service. For instructions, see Enabling Cloudera Data Engineering. You cannot enable spot instances on an existing CDE service. To use spot instances, create a new service.

After you have enabled spot instances for the CDE service, you can configure individual virtual clusters to use on-demand instances only, spot instances only, or the default hybrid model (on-demand for drivers, spot for executors).

For more information on spot instances, see the cloud provider documentation:

Amazon AWS
Amazon EC2 Spot Instances