CDP Public Cloud: July 2022 Release Summary

Data Engineering

This release (1.16) of the Cloudera Data Engineering (CDE) data service on CDP Public Cloud introduces the following new features and improvements:

Airflow pipeline UI editor

Airflow Pipeline UI editor is now GA as a default feature in new Virtual Clusters with support for all major browsers (Firefox, Chrome, and Safari).

Upgrade to Airflow 2.2.5

CDE 1.16 now runs with Airflow 2.2.5. Several fixes to improve performance and stability have been bundled with the upgrade. New Virtual Clusters will automatically use the new Airflow version. This version deprecated the timezone package usage. The DAGs need to be updated to use the pendulum package instead. If your airflow DAGs need to be timezone aware then they should rely on the pendulum timezone library for start and end dates as described here. Otherwise, the backup and restore process will not be able to restore these DAGs. For more information, see CDE known issues.

Spark 3 support for raw scala code

Spark 3 support for raw scala code. Previously this feature was limited to Spark 2, but it has now been extended to Spark 3 based virtual clusters. This allows you to directly run raw scala via API & CLI in batch-mode without having to compile, similar to what spark-shell supports.

Support for Azure private storage

CDE now supports Azure private storage. Both private ABFS and ADLS Gen2 containers are now supported.

Editing VC configurations post creation

You can now modify the virtual settings such as cluster quotas (CPU/memory) dynamically.

Loading example jobs and sample data using new VCs

CDE provides an option to add in-product examples of data and jobs in new virtual clusters to facilitate smoother onboarding and learning for new customers.

Kubernetes update

CDE now supports K8s 1.22. The CSP EOS for K8s 1.21 is as follows:

Support for creation of a default virtual cluster

CDE now provides support for default virtual clusters. This will help you get a jump start to create your jobs easily, without having to wait to create a CDE virtual cluster. You have the option to turn this selection off if you do not wish to use a default virtual cluster. For more information, see Enabling Cloudera Data Engineering service.

In-place upgrades (Preview)

CDE supports upgrades from CDE 1.14 on both AWS and Azure. The upgrades can be triggered by an admin from CDE user interface. Users will need to manually pause/backup/restore each Virtual Cluster to account for upgrade failures.

  • Upgrades of CDE core components include: EKS, AKS Services, and Application Services
  • Upgrades of dependencies include: Helm, K8s versions, YuniKorn

Machine Learning

This release (2.0.32) of the Cloudera Machine Learning (CML) data service on CDP Public Cloud introduces the following new features and improvements:

Garbage collection for deleted projects

This feature allows you to trigger cleanup of deleted projects. A separate feature allows older orphaned projects to be marked for cleanup. For more information, see Project Garbage Collection.

Disable Runtimes

It is now possible to disable and enable runtimes. For more information, see Disabling Runtimes.

Monitoring for applications

This feature allows you to monitor the technical health of deployed applications, including statistics and visualizations of CPU and memory usage. For more information, see Monitoring applications.

Custom polling endpoints for applications

This feature allows the application creator to define what application endpoint servers poll to detect if the application is running, that avoids problems some applications have with polling the root endpoint. For more information, see Application polling endpoint.

PBJ Workbench Runtimes (Preview) now work with Sessions, Experiments, Jobs and Applications

This feature enables the classic workbench UI backed by the open-source Jupyter protocol. This architectural change improves consistency, stability, and ease of customization while eliminating the dependency on proprietary CML code. For more information, see “PBJ Workbench” in Preview Features.

Kubernetes 1.22

Kubernetes 1.22 is now supported for CDE for both AWS and Azure.

Management Console

This release of the Cloudera Management Console service on CDP Public Cloud introduces the following new features and improvements:

New documentation for CDP Public Cloud upgrade

The CDP Public Cloud upgrade advisor, which gives an overview and FAQ of the upgrade process, is now available. See CDP Public Cloud upgrade advisor.

FreeIPA scaling

You can resize your existing FreeIPA cluster via CDP CLI. Upscaling FreeIPA is recommended after performing Data Lake scaling. For more information, see Resize FreeIPA.

Changed permissions for managing proxies in CDP

You no longer need to be a PowerUser to register and manage a proxy in CDP. The new minimal roles are as follows:

  • EnvironmentCreator can register a proxy in CDP.
  • Owner or SharedResourceUser can view details of a proxy.
  • Owner can delete a proxy registration from CDP.
    This change has been introduced for new proxy registrations only; That is, proxies registered prior to this change continue to be managed by a PowerUser. See updated Setting up a non-transparent proxy in CDP.

Support for Machine Learning in ap-1 and eu-1 regional Control Planes

Cloudera Machine Learning is now supported in the ap-1 (Australia) and eu-1 (Germany) regional Control Planes. For the list of all supported services for all supported Control Plane regions, see CDP Control Plane regions.