Migrating Spark CDP to Cloudera Data Engineering

Cloudera Date Engineering Service (CDE) is designed as a fully managed service for Spark. Among many other features, CDE streamlines and provides better Spark jobs monitoring capabilities with an enhanced Job Analysis page, that builds upon the Spark UI and Apache Airflow for orchestrating Spark pipelines.

The CDE service currently supports Spark batch jobs only. Spark streaming is experimental and is not recommended for production. For information about guidelines and limitations, see Experimental support for Spark Streaming and Spark Structured Streaming.

CDE does not change Spark. It allows you to easily deploy managed Spark 2.4.0 or higher and Spark 3.0+ clusters in the cloud, so if you are already using Spark in your code, you can migrate the code to CDE as is.

The deployment mode changes from YARN to Kubernetes, but CDE automatically sets the required Kubernetes properties upon job creation, so you need not set them. However, you may have to convert some YARN related properties. Details of these YARN properties are discussed in the Convert Spark Submits to CDE CLI Spark Submits section.