Data Services Release Notes

CDP Private Cloud Data Services includes the Management Console, Cloudera Data Warehouse (CDW), Cloudera Machine Learning (CML) and Cloudera Data Engineering (CDE). Learn about the new features and improvements in each of these services.

New features in Platform

Certifications

CDP Private Cloud Base (7.1.9 CHF 6, 7.1.7 SP3, 7.1.8 CHF22)
Cloudera Manager 7.11.3 CHF 6
Iceberg v2 GA on CDW, CDE, & CML with Ozone
OEL (RHCK Kernel only) 8.7, 8.8, 8.9, 9.1, 9.2, 9.3
RHEL 8.7, 8.8, 8.9, 9.1, 9.2, 9.3
K8s 1.27 and OCP 4.14

Stability and Resiliency: New prerequisite check in ECS Install Wizard

A new step is added in the ECS Install Wizard called Check Prerequisites. This ECS prerequisite checks fresh installations seamlessly and improves the overall installation experience for administrators. This step checks if the ECS hosts meet a list of minimum requirements before installation. For more information on this prerequisite check, see Installing CDP Private Cloud Data Services using ECS.

DRS automatic backups

Starting from CDP Private Cloud Data Services 1.5.4, DRS automatic backups for Control Plane, CDW, and CDE are enabled by default on ECS clusters for new installations or after cluster upgrade to version 1.5.4 or higher. You can disable this option, if required. You can also configure the external storage in Longhorn for ECS, and then initiate DRS automatic backups to it. Automatic backups (DRS) functionality is disabled by default on OCP clusters. For more information, see DRS automatic backups.

Authentication for ingress TLS/SSL

A new property (ssl_private_key_password) is added to the Cloudera Manager to specify the password for the private key in the Ingress Controller TLS/SSL Server Certificate and Private Key file.

Improved Diagnostics

The tez-site.xml file is now included in the Management Console diagnostic bundle download.

New features in Cloudera Data Engineering

Support for password protected private key to initialize the virtual cluster: You can now use the password protected private keys to initialize the virtual cluster. Currently, the password protected private keys are supported with RSA and EC algorithms only. For more information, see Initializing virtual clusters.
Support for Spark Connect session (Preview): CDE supports Spark Connect sessions which are a type of CDE sessions that expose the Spark Connect interface and allows you to run spark commands from any remote Python environment.

New features in Cloudera Data Warehouse

Hue supports natural language query processing (Preview)

Hue leverages the power of Large Language Models (LLM) to help you generate SQL queries from natural language prompts and also provides options to optimize, explain, and fix queries, promoting efficient and accurate practices for accessing and manipulating data. You can use several AI services and models such as OpenAI’s GPT service, Amazon Bedrock, and Azure’s OpenAI service to run the Hue SQL AI assistant.

To learn more about the supported models and services, limitations, and what data is shared with the LLMs, see About the SQL AI Assistant in CDW.
To set up and enable the SQL AI Assistant, see About setting up the SQL AI Assistant in CDW.
To see how to generate, edit, explain, optimize, and fix queries, see Starting the SQL AI Assistant in Hue.

Improvements to custom pod configuration (now known as Resource Templates)

Several improvements and changes have been made to the custom pod configuration functionality starting with CDP Private Cloud Data Services 1.5.4. The custom pod configuration has been renamed to Resource Templates.

A new menu option Resource Templates has been added to the left navigation pane on the CDW web interface.
You can now configure the allocation of Kubernetes resources to the pods for Hive, Data Visualization, and Database Catalog in addition to Impala.
The Impala pod configuration feature is moved from the Environment Details page to the Resource Templates page.

For more information, see Resource templates for CDW Private Cloud pods.

Flexibility to enable and disable quota management for CDW entities

Earlier, you were required to enable the quota management option before activating an environment to use quota-managed resource pools for environments, Data Catalogs, Virtual Warehouses, and Data Visualization instances in CDW. Starting with CDP Private Cloud Data Services 1.5.4, you can enable or disable the quota management option at any time during the lifecycle of the CDW entities. To learn more about the behavioral aspects, see Quota Management in CDW Private Cloud.

Added support for authentication between CDW and HMS database using mTLS

CDW and the Hive MetaStore (HMS) database on the base cluster can mutually authenticate each other during the SSL/TLS handshake using mTLS for all supported backend databases (Oracle, MySQL, MariaDB, and Postgres). To set up mTLS, you must upload the database client certificate and the private key files in PEM format while activating an environment in CDW. See Enabling mTLS between the HMS database and CDW on Private Cloud.

Ability to enable active-passive configuration for HiveServer2 pods

CDW provides an option to enable active-passive configuration for HiveServer2 (HS2) pods in Private Cloud. By enabling this feature, two HS2 pods run simultaneously–one active and the other inactive. When one pod terminates, the inactive pod becomes active, providing High Availability. See HiveServer2 High Availability in CDW Private Cloud. The most likely cause of a pod's termination is node failure.

CDW no longer has a dependency on YARN

Environment activation in CDW no longer depends on or fails if the YARN service is not installed on the CDP Base cluster.

Improvements to backup and restore CDW

There are two ways to create backups of the Data Warehosue service:

Using Data Recovery Service (DRS)
Using the CDW’s CDP CLI cluster management commands

By default, CDW backs up namespace-related data before starting the upgrade process using the Data Recovery Service (DRS). A new option called Back up Virtual Warehouse namespaces before an upgrade has been added to disable the automatic backup process on the Advanced Configuration page in the CDW web interface.

New features in Cloudera Machine Learning

CML Service Accounts are available in CML Private Cloud: In CML, the Kerberos principal for the Service Account may not be the same as your login information. Therefore, ensure you provide the Kerberos identity when you sign in to the Service Account. For more information, see Authenticating Hadoop for CML Service Accounts.
Model Registry is available in CDP Private Cloud: Model Registry is now generally available (GA) in CDP Private Cloud. Model Registry in CDP Private Cloud uses Apache Ozone to store model artifacts. For creating a Model Registry you need the Ozone S3 gateway endpoint, the Ozone access key, and the Ozone secret key. If you deploy Model Registry in an environment that contains one or more CML workspaces, you must synchronize the Model Registry with the workspaces. For more information, see Prerequisites for creating Model Registry and Synchronizing the model registry with a workspace.
Heterogeneous GPU usage: When using heterogeneous GPU clusters to run sessions and jobs, the available GPU accelerator labels need to be selected during workload creation. For more information, see Heterogeneous GPU clusters.
Data connections without auto discovery: Cloudera Machine Learning is a flexible, open platform, supporting connections to many data sources. The provided code samples demonstrate how to access local data for CML workloads. For more information, see Connecting to CDW.
Spark Log4j Configuration: Cloudera Machine Learning allows you to update Spark’s internal logging configuration on a per-project basis. Spark logging properties can be customized for every session, and job with a default file path found at the root of your project. You can also specify a custom location with a custom environment variable. For more information, see Spark Log4j Configuration.
ML Metrics Collector service: The Metrics Collector service gathers data about how users and groups use resource quota, like how much CPU, Memory and GPU capacity (if any) is allocated, and what the users or groups utilize from that. The Metrics Collector service is running by default, but to collect data about resource quota metrics, you need to enable the Quota Management feature. For more information, see ML Metrics Collector Service overview.
Quota Management for group level (Preview): Quota Management Technical Preview (TP) release enables you to control how resources are allocated within your CML workspace on user and on group level. Yunikorn Gang Scheduling is also available, which is the default scheduling mechanism in Cloudera Machine Learning. For more information, see Quota Management overview and Yunikorn Gang Scheduling.
Restarting a failed AMP setup: You can now retry failed AMP deployment steps and continue the AMP setup to handle intermittent and configuration issues. For more information, see Restarting a failed AMP setup.
New Hadoop CLI Runtime Addon versions are available: The HadoopCLI 7.1.8.3-601 Runtime Addon is released for the Private Cloud.

Data Services Release Notes

New features in Platform

New features in Cloudera Data Engineering

New features in Cloudera Data Warehouse

New features in Cloudera Machine Learning

Release notes for component services