May 18, 2023
This release (1.19) of the Cloudera Data Engineering (CDE) service on CDP Public Cloud introduces the following changes.
[Technical Preview] Cloudera Data Engineering Sessions
-
The Sessions feature is available via CDE user interface and the CDE CLI.
- Python, Scala, and Java are supported session types.
[Technical Preview] Lock and unlock
CDE supports locking of a CDE Service which freezes the configuration of a Service and its corresponding Virtual Clusters. When a Service is locked, you are unable to edit, add, or delete a Service and its Virtual Clusters. This will assist during planned upgrades to ensure changes are not made to the Service. For more information, see Locking and unlocking a CDE Service.
[Technical Preview] Airflow file based resource using the CDE CLI
CDE supports Airflow file based resources using the CDE CLI. By creating a pipeline in CDE using the CLI, you can add custom files that are available for tasks. For more information, see Creating an Airflow pipeline with custom files using CDE CLI [technical preview].
Support for new AWS regions
CDE now supports the Hong Kong and Jakarta regions for AWS.
Support for multiple Spark 3 runtimes
CDE supports Spark 3.3. Note that Spark 3.3 must use Data Lake 7.2.16. For more information, see Cloudera Data Engineering and Data Lake compatibility.
Creating and using multiple profiles using CDE CLI
You can now add a collection of CDE CLI configurations grouped together as profiles,
to the config.yaml
file. You can use these profiles while running commands. You
can set the configurations either at a profile level or at a global level. For more information,
see Creating and using multiple profiles using CDE CLI.
Using spark-submit drop-in migration tool for migrating Spark workloads to CDE
CDE provides a command line tool cde-env to help migrate your CDP Spark workloads
running on CDP Private Cloud Base (“spark-on-YARN”) to CDE without having to completely
rewrite your existing spark-submit
command-lines. For more information, see
Using spark-submit drop-in migration tool for migrating Spark
workloads to CDE.
New Virtual Cluster types
- Core (Tier 1) - Batch-based transformation and engineering options.
- All-Purpose (Tier 2) - Develop using interactive sessions and deploy both batch and streaming workloads.
Default setting for external.table.purge property for migrating Hive tables
When migrating Iceberg tables in Spark 3, the external.table.purge property is now set to FALSE by default. For more information, see Importing and migrating Iceberg table in Spark 3.
New exit codes for the CDE CLI
CDE now provides exit codes for the CDE CLI. The exit codes help users better identify the error. For more information, see Cloudera Data Engineering CLI exit codes.
Support for Data Lake
CDE 1.19 supports Data Lake 7.2.14 through 7.2.16. For more information, see Cloudera Data Engineering and Data Lake compatibility.