Cloudera Public Cloud: September 2024 Release Summary

The Release Summary of Cloudera Public Cloud summarizes major features introduced in Management Console, Data Hub, and data services.

Cloudera Public Cloud🔗

Cloudera Private Link Network for AWS🔗

Cloudera Private Link Network enables you to connect privately and securely to the CDP Control Plane without traversing the internet. You can use Cloudera Private Link Network for end-to-end encryption of your workloads between CDP Control Plane and AWS VPC endpoints.
For more information, see the Cloudera Private Link Network for AWS documentation.

Deprecated AWS and Azure quickstart guides🔗

The AWS and Azure quickstart guides are deprecated in the Cloudera Public Cloud documentation and no longer being maintained. To quickly set up Cloudera Public Cloud on AWS or Azure, Cloudera recommends that you use Terraform. For more information, see the Deploy CDP using Terraform documentation.

Machine Learning🔗

The September 2024 release (2.0.46-b200) of Cloudera Machine Learning introduces the following features and fixes:

New Features / Improvements🔗

Model Hub: Model Hub is a catalog of top-performing LLM and generative AI models. You can now easily import the models listed in the Model Hub into the Model Registry and then deploy it using the Cloudera AI Inference service. This streamlines the workflow of developers working on AI use cases by simplifying the process of discovering, deploying, and testing models.
For more information, see Using Model Hub.
Registered Models: Registered Models offers a single view for models stored in Model Registries across CDP Environments and facilitate easy deployment to the Cloudera AI Inference service. When you import models from Model Hub, the models are listed under Registered Models. This page lists all imported models and associated metadata, such as the model’s associated environment, visibility, owner name, and created date. You can click on any model to view details about that model, and its versions, and deploy any specific version of the model to the Cloudera AI Inference service.
For more information, see Using Registered Models.
Cloudera AI Inference Service (Technical Preview): AI Inference service is a production-grade serving environment for traditional, generative AI, and LLM models. It is designed to handle the challenges of production deployments, such as high availability, fault tolerance, and scalability. The service is now available for users to carry out inference on the following three categories of models:
- TRT-LLMs: LLMs that are optimized to TRT engine and available in NVIDIA GPU Cloud catalog, also known as NGC catalog.
- LLMs available through Hugging Face Hub.
- Traditional machine learning models like classification, regression, and so on. Models need to be imported to the model registry to be served using the Cloudera AI Inference Service.
Model Registry Standalone API: Model Registry Standalone API is now fully supported. This new API is available from the Model Registry service to import, get, update and delete models without relying on the CML Workspace service.
For more information, see Model Registry Standalone API.
New Amazon S3 Data Connection: A new Amazon S3 object store connection is automatically created for CML Workspaces to make it easier to connect to the data stored within the same environment. Other Data Connections can be configured to other S3 locations manually.
For more information, see Setting up Amazon S3 data connection.
Enhancements to Synced Team: Team administrators and Site administrators can now add multiple groups to a synced team, view members of a group, delete a group within a team, update roles for a group within a team, and update a custom role for a member within a group.
For more information, see Managing a Synced Team.
Auto synchronization of Model Registry with a CML Workspace: If you deploy a Model Registry in an environment that contains one or more CML Workspaces, the Model Registry is now auto-discovered and periodically synchronized by Cloudera AI Inference service and CML Workspaces and no manual synchronization is required. CML Workspace is auto-synchronized every five minutes and Cloudera AI Inference service is auto-synchronized every 30 seconds.
For more information, see Synchronizing the Model Registry with a CML Workspace.
Environment: Support for Environment V2 is added for CML Workspaces.
Kubernetes: Support for AKS 1.29 and EKS 1.29 was added.
Metering: Support for Metering V2 is added for new CML Workspaces.

Fixed Issues🔗

DSE-35779: Fixed the issue related to a race condition between writing the JWT file by kinit container and reading by the engine container in the workload pod.
DSE-37065: Previously, API V2 did not allow collaborators to be added as admin. This issue is now resolved.
DSE-33647: Previously, workspace instances reset to default when upgraded. This issue is now resolved.

Management Console🔗

This release of the Management Console service introduces the following changes:

Receiving announcements🔗

You can subscribe to receive announcements and notifications in Cloudera Public Cloud about various events from product updates to data service specific alerts. Announcements include Cloudera product related announcements that can include updates related to End of Life (EOL), End of Support (EOS), Technical Service Bulletins (TSBs), and maintenance updates.
For more information, see the Receiving notifications documentation.

Compute Cluster enabled environments🔗

Compute Clusters enable you to deploy a containerized platform on Kubernetes for Data Services and shared services. The Compute Cluster architecture offers simplified management, enhanced efficiency, and centralized control that leads to faster deployments, reduced configuration errors and improved system reliability. As multiple Data Services can optionally share the same Compute Cluster, it also lowers the cost of ownership.
For more information, see the Using Compute Cluster for AWS or Using Compute Cluster for Azure documentation.

Private Link for Azure Flexible Server🔗

Azure Database for PostgreSQL Flexible Server allows a highly available database to be deployed for Data Lake and Data Hub clusters. It is no longer required to use delegated subnets for Flexible Servers to be used with private access as Private Link for Azure Flexible Server is now supported in CDP.
For more information, see Using Azure Database for PostgreSQL Flexible Server and Private setup for Azure Flexible Server.

Operational DataBase🔗

The Cloudera Operational Database 1.45 version supports updates to the HDFS instance types and enhancements to the Operational Database UI.

Enhancements to the create-database command🔗

The CDP CLI adds a new option –custom-instance-types to the create-database command. Using this option, you can define custom instance types; however, the instance types must be included in the allowlist by the Operational Database. Currently, the allowlist is not accessible, however, the following are the new custom instance types supported by the Operational Database.

AWS, HEAVY, HDFS:
Worker: m6i.4xlarge, m7i.4xlarge
Master: m6i.8xlarge, m7i.8xlarge
Compute/edge/leader: m6i.2xlarge, m7i.2xlarge
Gateway: r6i.8xlarge, r7i.8xlarge

AWS, LIGHT, HDFS:
Worker: m6i.4xlarge, m7i.4xlarge
Compute/edge/leader/gateway: m6i.2xlarge, m7i.2xlarge
Master: m6i.4xlarge, m7i.4xlarge

The following is a sample output of the create-database command highlighting the usage of the –custom-instance-types option.
```
  cdp opdb create-database 
  --environment-name cod_env 
  --database-name cod_db 
  --custom-instance-types masterType=m7i.4xlarge,workerType=m7i.4xlarge,leaderType=m7i.2xlarge,gatewayType=m7i.2xlarge
  --storage-type=HDFS 
  --scale-type LIGHT
```
The --storage-type option is now optional. If you do not define the --storage-type option, the Operational Database considers the default storage type.

The default storage type is blob storage. If the ephemeral storage is enabled, the Operational Database considers the storage type as CLOUD_WITH_EPHEMERAL; otherwise, CLOUD is considered.

For more information, see CDP CLI documentation.

Updates to the HDFS clusters on AWS environments to add support for m6i and m7i instance types🔗

When you create an operational database with HDFS storage type in an AWS environment, the Operational Database on HDFS clusters now also supports m6i and m7i instances for the applicable nodes. The Operational Database clusters with HDFS storage type are upgraded to enhance the operational database’s performance and usability.

The following are the new custom instance types supported by the Operational Database.

AWS, HEAVY, HDFS:
Worker: m6i.4xlarge, m7i.4xlarge
Master: m6i.8xlarge, m7i.8xlarge
Compute/edge/leader: m6i.2xlarge, m7i.2xlarge
Gateway: r6i.8xlarge, r7i.8xlarge

AWS, LIGHT, HDFS:
Worker: m6i.4xlarge, m7i.4xlarge
Compute/edge/leader/gateway: m6i.2xlarge, m7i.2xlarge
Master: m6i.4xlarge, m7i.4xlarge

Enhancements to the Cloudera Operational Database UI🔗

The Cloudera Operational Database UI is updated for better usability and performance. The following are the key enhancements.

The cloud form factor on which the database is deployed is shown as a logo under the Environment column.
The Version column is renamed to the Runtime Version in the Databases screen.
A new column Node Count is added to the Databases screen.
The Date Created column name has been changed to Uptime on the Databases screen.
The SQL Editor has been renamed HUE and displayed as a link under the new Apps column.
A new action menu item Collect diagnostic bundle is added to the Databases > database_name > Actions.

Observability🔗

Cloudera Observability Essential for Cloudera AI🔗

Cloudera Observability Essential is now Generally Available for Cloudera AI, formerly known as Cloudera Machine Learning (CML) for cloud customers on version 2.0.46 and higher. Cloudera Observability Essential introduces new governance, auditing, and monitoring capabilities for Cloudera workloads. Key features include:

Cloudera AI service level analysis for all workbench deployments
Cloudera AI workbench analysis across Users, Teams, Projects and Infrastructure at each workbench level
User workload list views and analysis
Resource consumption analysis across Cloudera AI Service (formerly CML Level) and individual workbenches (formerly workspaces)

For more information, see Monitor Cloudera Machine Learning (CML) workspace and workload performance using Cloudera Observability.

The list of known issues in this release of Observability can be found here.