May 18, 2023

This release (1.19) of the Cloudera Data Engineering (CDE) service on CDP Public Cloud introduces the following changes.

[Technical Preview] Cloudera Data Engineering Sessions

CDE 1.19 introduces Sessions as an interactive short-lived development environment for running Spark commands to help you iterate upon and build your Spark workloads.
  • The Sessions feature is available via CDE user interface and the CDE CLI.

  • Python, Scala, and Java are supported session types.
For more information, see Creating Sessions in Cloudera Data Engineering [Technical Preview] and Managing Sessions in Cloudera Data Engineering using the CLI.

[Technical Preview] Lock and unlock

CDE supports locking of a CDE Service which freezes the configuration of a Service and its corresponding Virtual Clusters. When a Service is locked, you are unable to edit, add, or delete a Service and its Virtual Clusters. This will assist during planned upgrades to ensure changes are not made to the Service. For more information, see Locking and unlocking a CDE Service.

[Technical Preview] Airflow file based resource using the CDE CLI

CDE supports Airflow file based resources using the CDE CLI. By creating a pipeline in CDE using the CLI, you can add custom files that are available for tasks. For more information, see Creating an Airflow pipeline with custom files using CDE CLI [technical preview].

Support for new AWS regions

CDE now supports the Hong Kong and Jakarta regions for AWS.

Support for multiple Spark 3 runtimes

CDE supports Spark 3.3. Note that Spark 3.3 must use Data Lake 7.2.16. For more information, see Cloudera Data Engineering and Data Lake compatibility.

Creating and using multiple profiles using CDE CLI

You can now add a collection of CDE CLI configurations grouped together as profiles, to the config.yaml file. You can use these profiles while running commands. You can set the configurations either at a profile level or at a global level. For more information, see Creating and using multiple profiles using CDE CLI.

Using spark-submit drop-in migration tool for migrating Spark workloads to CDE

CDE provides a command line tool cde-env to help migrate your CDP Spark workloads running on CDP Private Cloud Base (“spark-on-YARN”) to CDE without having to completely rewrite your existing spark-submit command-lines. For more information, see Using spark-submit drop-in migration tool for migrating Spark workloads to CDE.

New Virtual Cluster types

CDE now provides a choice between two tiers during Virtual Cluster creation. Administrators will see the following two options:
  • Core (Tier 1) - Batch-based transformation and engineering options.
  • All-Purpose (Tier 2) - Develop using interactive sessions and deploy both batch and streaming workloads.
For more information, see Creating virtual clusters.

Default setting for external.table.purge property for migrating Hive tables

When migrating Iceberg tables in Spark 3, the external.table.purge property is now set to FALSE by default. For more information, see Importing and migrating Iceberg table in Spark 3.

New exit codes for the CDE CLI

CDE now provides exit codes for the CDE CLI. The exit codes help users better identify the error. For more information, see Cloudera Data Engineering CLI exit codes.

Support for Data Lake

CDE 1.19 supports Data Lake 7.2.14 through 7.2.16. For more information, see Cloudera Data Engineering and Data Lake compatibility.