CDP Public Cloud: July 2023 Release Summary

Data Hub

This release of the Data Hub service introduces the following changes:

Java 11 support for Data Hub and Data Lake clusters

You can launch a new Data Hub or Data Lake cluster with Java 11 as the default JDK. This capability is currently only available through the CDP CLI. See Java 11 for more information.

DataFlow

This release (2.5.0-h1-b6) of Cloudera DataFlow (CDF) on CDP Public Cloud resolves a critical bug in JVM heap allocation for newly created NiFi deployments besides other minor bug fixes.

Machine Learning

Version 2.0.38-H3 introduces bug fixes only.

Version 2.0.40 introduces the following new features and improvements:

  • CML Home - A new landing experience that helps you to jump to your most recent Projects, walks you through the key capabilities of the platform and keeps you up-to-date with the latest developments.
  • 3rd-party editor support - The PBJ architecture adds support for 3rd party editors, enabling building custom ML Runtimes from scratch with JupyterLab, RStudio, and other editors of your choice.
  • Models - Models can now be deployed from the model registry to CML workspaces using API v2.
  • Kubernetes - Kubernetes 1.26 support is available for Azure.
  • Model Registry (Preview) - Model Registries can now be deployed on an Azure private cluster.
  • CML Scalability - CML Control Plane flows have been verified for 100-node clusters and high volume workloads are now enabled by default.
  • Project Migration - Support for project export and import from CDSW to CML public or private cloud environments, or for migration between development and production environments.
  • Retry Install Workspace - Workflow-based support for retry of CML workspace creation, in the event workspace installation fails.
  • Preflight Checks for Instance Groups - Pre-flight checks are run when an instance group is being modified to ensure the requested configuration is valid.”
  • Preflight Checks for Update Workspace - Pre-flight checks are run when a workspace update is being requested to ensure the requested change is valid.
  • SDX - SDX 7.2.17 has been integrated and verified with CML.
  • Runtimes - On the New Project page, CML code now defaults to the Python 3.9 ML Runtime Edition.
  • HadoopCLI - The DL versions of HadoopCLI 7.2.8, 7.2.10 and 7.2.11 Runtime-Addon versions on Public Cloud reached End of Support and have been removed.
  • Job notifications - All email-related control will be hidden by job creation or job settings if the SMTP host is not configured. If email recipients have been added previously to the job but the SMTP host is not configured, the Job Notifications section will be displayed as a warning message, informing the user of the problem.
  • Project - Site administrators can now restrict project creation for users and/or teams.
  • Environment variables - Users can now hide the values of sensitive environment variables on the Account/Project/Workload/Workspace level.

Management Console

This release of the Management Console service introduces the following changes:

Enterprise Data Lakes

A new Data Lake shape called the enterprise Data Lake is now available for new deployments of Cloudera Runtime version 7.2.17. The enterprise Data Lake is a redefined version of medium duty Data Lakes that still offer failure resilience, but utilize resources and allocate memory more efficiently than a medium duty Data Lake at the same cost. For more information see Data Lake scale.

Medium duty Data Lakes deprecated

Medium duty Data Lakes are deprecated as of Runtime 7.2.17. You can upgrade a medium duty Data Lake from 7.2.16 to 7.2.17, but will not be able to upgrade it further. You can create a new 7.2.17 medium duty Data Lake through the CDP CLI, but Cloudera recommends using the enterprise Data Lake for new deployments. The ability to create medium duty Data Lakes will be removed from both the UI and CLI from 7.2.18.

CCM enabled by default in the UI

When you register a new AWS, Azure, or GCP environment in CDP via web interface, Cluster Connectivity Manager (CCM) is enabled by default and the option to disable it has been removed from the web interface. Note: CCM is currently disabled by default when registering an environment via CDP CLI. If you would like to register an environment with CCM enabled via CDP CLI, you need to explicitly enable CCM.

CCMv2 upgrade

CCMv2 upgrade is available for all customers. If your existing CDP environment was created with CCMv1, you will see a notification in your environment details to upgrade to CCMv2. See Upgrading from CCMv1 to CCMv2.