CDP Public Cloud: July 2022 Release Summary
Data Engineering
This release (1.16) of the Cloudera Data Engineering (CDE) data service on CDP Public Cloud introduces the following new features and improvements:
Airflow pipeline UI editor
Airflow Pipeline UI editor is now GA as a default feature in new Virtual Clusters with support for all major browsers (Firefox, Chrome, and Safari).
Upgrade to Airflow 2.2.5
CDE 1.16 now runs with Airflow 2.2.5. Several fixes to improve performance and stability have been bundled with the upgrade. New Virtual Clusters will automatically use the new Airflow version. This version deprecated the timezone package usage. The DAGs need to be updated to use the pendulum package instead. If your airflow DAGs need to be timezone aware then they should rely on the pendulum timezone library for start and end dates as described here. Otherwise, the backup and restore process will not be able to restore these DAGs. For more information, see CDE known issues.
Spark 3 support for raw scala code
Spark 3 support for raw scala code. Previously this feature was limited to Spark 2, but it has now been extended to Spark 3 based virtual clusters. This allows you to directly run raw scala via API & CLI in batch-mode without having to compile, similar to what spark-shell supports.
Support for Azure private storage
CDE now supports Azure private storage. Both private ABFS and ADLS Gen2 containers are now supported.
Editing VC configurations post creation
You can now modify the virtual settings such as cluster quotas (CPU/memory) dynamically.
Loading example jobs and sample data using new VCs
CDE provides an option to add in-product examples of data and jobs in new virtual clusters to facilitate smoother onboarding and learning for new customers.
Kubernetes update
CDE now supports K8s 1.22. The CSP EOS for K8s 1.21 is as follows:
- For Azure: July 2022
- For AWS: February 2023
Check for removals as per this upgrade: Kubernetes API and Feature Removals in 1.22 and Removed APIs by release.
Support for creation of a default virtual cluster
CDE now provides support for default virtual clusters. This will help you get a jump start to create your jobs easily, without having to wait to create a CDE virtual cluster. You have the option to turn this selection off if you do not wish to use a default virtual cluster. For more information, see Enabling Cloudera Data Engineering service.
In-place upgrades (Preview)
CDE supports upgrades from CDE 1.14 on both AWS and Azure. The upgrades can be triggered by an admin from CDE user interface. Users will need to manually pause/backup/restore each Virtual Cluster to account for upgrade failures.
- Upgrades of CDE core components include: EKS, AKS Services, and Application Services
- Upgrades of dependencies include: Helm, K8s versions, YuniKorn
Machine Learning
This release (2.0.32) of the Cloudera Machine Learning (CML) data service on CDP Public Cloud introduces the following new features and improvements:
Garbage collection for deleted projects
This feature allows you to trigger cleanup of deleted projects. A separate feature allows older orphaned projects to be marked for cleanup. For more information, see Project Garbage Collection.
Disable Runtimes
It is now possible to disable and enable runtimes. For more information, see Disabling Runtimes.
Monitoring for applications
This feature allows you to monitor the technical health of deployed applications, including statistics and visualizations of CPU and memory usage. For more information, see Monitoring applications.
Custom polling endpoints for applications
This feature allows the application creator to define what application endpoint servers poll to detect if the application is running, that avoids problems some applications have with polling the root endpoint. For more information, see Application polling endpoint.
PBJ Workbench Runtimes (Preview) now work with Sessions, Experiments, Jobs and Applications
This feature enables the classic workbench UI backed by the open-source Jupyter protocol. This architectural change improves consistency, stability, and ease of customization while eliminating the dependency on proprietary CML code. For more information, see “PBJ Workbench” in Preview Features.
Kubernetes 1.22
Kubernetes 1.22 is now supported for CDE for both AWS and Azure.
Management Console
This release of the Cloudera Management Console service on CDP Public Cloud introduces the following new features and improvements:
New documentation for CDP Public Cloud upgrade
The CDP Public Cloud upgrade advisor, which gives an overview and FAQ of the upgrade process, is now available. See CDP Public Cloud upgrade advisor.
FreeIPA scaling
You can resize your existing FreeIPA cluster via CDP CLI. Upscaling FreeIPA is recommended after performing Data Lake scaling. For more information, see Resize FreeIPA.
Changed permissions for managing proxies in CDP
You no longer need to be a PowerUser to register and manage a proxy in CDP. The new minimal roles are as follows:
- EnvironmentCreator can register a proxy in CDP.
- Owner or SharedResourceUser can view details of a proxy.
- Owner can delete a proxy registration from CDP.
This change has been introduced for new proxy registrations only; That is, proxies registered prior to this change continue to be managed by a PowerUser. See updated Setting up a non-transparent proxy in CDP.
Support for Machine Learning in ap-1 and eu-1 regional Control Planes
Cloudera Machine Learning is now supported in the ap-1 (Australia) and eu-1 (Germany) regional Control Planes. For the list of all supported services for all supported Control Plane regions, see CDP Control Plane regions.