Release NotesPDF version

February 27, 2024

This release (1.20.3) of the Cloudera Data Engineering (CDE) service on CDP Public Cloud introduces the following changes.

CDE Sessions is now GA as a default feature. Sessions is an interactive short-lived development environment for running Spark commands to help you iterate upon and build your Spark workloads. The Interaction tab was added so that you can run Java, Impala, and PySpark code in blocks to develop applications. Cloudera currently supports Sessions in the CDE CLI and UI. The Spark UI tab was also added to view active sessions. For more information, see Creating and Managing CDE Sessions and Managing Sessions in CDE using the CLI.

CDE now has a revamped landing page with a new design that focuses on a more simplified workflow: Develop, Deploy, and Monitor.

CDE supports upgrades from two CDE versions 1.19.2 and above for AWS and 1.19.4 and above for Azure. Users will need to manually pause, backup, and restore each Virtual Cluster to account for upgrade failures. A way to handle upgrade failures has also been created. In-place upgrade also includes the following:
  • Upgrades of CDE core components include: EKS, AKS Services, and Application Services

  • Upgrades of dependencies include: Helm, K8s versions, YuniKorn
For more information, see Upgrading CDE and Handling upgrade failures in CDE .

You can now use Git repositories to collaborate, manage project artifacts, and promote applications from lower to higher environments. Cloudera currently supports Git providers such as GitHub, GitLab, and Bitbucket. Repository files can be accessed when you create a Spark or Airflow job. You can then deploy the job and use CDE's centralized monitoring and troubleshooting capabilities to tune and adjust your workloads. For more information, see Creating a Git repository in CDE (Technical Preview).

CDE supports 3rd party python-based plugins and libraries to build custom Airflow pipelines using the CDE UI and API. For more information, see Using custom operators and libraries for Apache Airflow and Using custom operators and libraries for Apache Airflow using API .

New parameters were added for Airflow. For more information, see CDE CLI Airflow flag reference and Submitting an Airflow job using the CLI.

CDE supports Spark Structured Streaming for both Spark 2 and Spark 3. For more information, see Support for Spark Structured Streaming in Cloudera Data Engineering (Technical Preview).

You can now restrict or grant access to a virtual cluster for specific groups that you specify. For more information, see Applying user and group access for virtual clusters.

New sliders to edit all-purpose nodes for AWS and Azure have been added to allow users to control the size of your auto-scaling group. For more information, see Enabling a Cloudera Data Engineering service.

CDE now supports K8s 1.27. For more information, see Compatibility for Cloudera Data Engineering and Runtime components.

For more information, see Support lifecycle policy.

Support for Airflow 2.6 to version 2.6. For more information, see Compatibility for Cloudera Data Engineering and Runtime components.

Impala VWs are supported and the CDWOperator is no longer needed for executing queries. For more information, see Apache Airflow in Cloudera Data Engineering.

When you upgrade to CDE 1.20.3, ensure that you also upgrade to Iceberg 1.3. For more information, see Compatibility for Cloudera Data Engineering and Runtime components.

CDE now supports setting subnets for the Load Balancer during service creation. For more information, see Enabling a Cloudera Data Engineering service.

You can select Enable Observability Analytics if you want diagnostic information about jobs and query execution sent to Cloudera Observability. This helps optimize troubleshooting. For more information, see Enabling a Cloudera Data Engineering service