CDP Public Cloud: April 2023 Release Summary

Data Hub

This release of the Data Hub service introduces new features and behaviors.

GP3 support for attached storage disks

Data Hubs now support GP3 (SSD) volume types for attached storage. GP3 volumes allow you to increase performance (independently provisioning IOPS and throughput) without increasing storage size. GP3 volumes will deliver similar performance as GP2 volumes at a lower cost. GP3 is now the default attached storage type for Data Hub instances that previously used GP2 storage. You can configure storage options when you provision a Data Hub cluster under the Hardware and Storage section of the Advanced Options.

DataFlow

This release (2.4.1-b22) of Cloudera DataFlow (CDF) on CDP Public Cloud supports new NiFi and Kubernetes versions, includes usability improvements for Flow Designer, resolves an issue with deployment ZooKeeper persistence and introduces various bug fixes across the platform.

  • CDF now creates Kubernetes clusters with version 1.24 in AWS EKS and 1.25 in Azure AKS. The PutIceberg processor is now considered GA (requires selecting NiFi version 1.20.0.2.3.8.2-2).
  • Improves event age off logic for environments and deployments that produce a large number of events.
  • In the Flow Designer, when deleting a parameter, referencing components are populated in the affected components dialog.
  • In the Flow Designer, the Inbound Connection Details button visibility was fixed when a Test Session successfully starts.
  • After upgrading CDF, aligned the CFM version of any existing Test Session configurations with the minimum CFM version supported by the Flow Designer.

Machine Learning

Version 2.0.38 introduces the following new features and improvements:

Custom Data Connections

Site administrators can now configure access to external data sources with the new Custom Data Connections feature. Data Scientists can access the external data via the cml.data library and its 2-liner abstractions.

Add Data

Data Scientists can now upload files to Hive and Impala Virtual Warehouse tables from the Data tab of any CML Project.

Model Replicas

Site Administrators can now configure the maximum number of model replicas that users can select for their models via the Maximum Model Replicas field on the Administration → Settings → Model Deployment Settings page.

Models

Users can now deploy large ML models. Model size is not limited to 50 MB anymore.

Model Registry

Administrators can now find the Machine User Workload User Name that is needed for configuring their model registries to access their S3 or ADLS Gen2 bucket on the Workspace Details page.

Install Workspaces (Technical Preview)

Auto-retriable workflow for Install Workspace is available for Azure Private Cluster. See the Feature Preview documentation for more information.

Usage Monitoring

Usage data for Spark Executors is now recorded in the CSV file that Site Administrators can download from the Administration > Usage tab.

API Keys

Improved security by storing Legacy API keys as hashes in the database. Existing Legacy API keys are automatically rotated as part of the upgrade process to ensure that previous keys cannot be used. API v1 keys will not be usable after the upgrade. To manually rotate a Legacy API key, do the following:

  • In User Settings > API Keys, click Rotate to generate a new Legacy ApiKey and ApiKeyHash pair.
  • Copy the Api Key that is shown after rotation and use it in future requests.
  • Note - The API key will not be visible on the UI once you refresh the page. Make sure to copy it before leaving the page.

Modify Instance Group Type

Administrators can easily change the CPU or GPU instance types of node groups for a CML workspace, without having to re-provision the workspace.

Management Console

This release of the Management Console service introduces the following changes:

FreeIPA and Data Lake instance type selection

You can now select the FreeIPA and Data Lake instance types when you register an environment. For more information see Register an AWS environment from CDP UI, Register an Azure environment from CDP UI, and Register a GCP environment from CDP UI.

AWS GP3 support for attached storage disks

AWS Data Lake and FreeIPA instances now support GP3 (SSD) volume types for attached storage. GP3 volumes allow you to increase performance (independently provisioning IOPS and throughput) without increasing storage size. GP3 volumes will deliver similar performance as GP2 volumes at a lower cost. GP3 is now the default attached storage type for instances that previously used GP2 storage.