Data Services Release Notes
Cloudera Data Services on premises includes the Cloudera Management Console, Cloudera Data Warehouse, Cloudera AI, Cloudera Data Catalog, Cloudera Replication Manager, and Cloudera Data Engineering.
Cloudera Data Services on premises 1.5.4 support up to 7.1.9 SP1.
Certifications
- Cloudera Base on premises - 7.1.9 CHF6, 7.1.8 CHF22 , and 7.1.7 SP3
- Cloudera Manager - 7.11.3 CHF6
- Iceberg v2 GA on Cloudera Data Warehouse, Cloudera Data Engineering, and Cloudera AI with Ozone
- Oracle Enterprise Linux (RHCK Kernel Only) - 8.7, 8.8, 8.9, 9.1, 9.2, 9.3
- RHEL - 8.7, 8.8, 8.9, 9.1, 9.2, 9.3
- K8s - 1.27 and OCP 4.14
- Longhorn version - 1.5.4
New features in Platform
- Stability and Resiliency: New prerequisite check in ECS Install Wizard
- A new step is added in the ECS Install Wizard called Check Prerequisites. This ECS prerequisite checks fresh installations seamlessly and improves the overall installation experience for administrators. This step checks if the ECS hosts meet a list of minimum requirements before installation. For more information on this prerequisite check, see Installing Cloudera Data Services on premises using ECS.
- DRS automatic backups
-
Starting from Cloudera Data Services on premises 1.5.4, DRS automatic backups for Cloudera Control Plane, Cloudera Data Warehouse, and Cloudera Data Engineering are enabled by default on ECS clusters for new installations or after cluster upgrade to version 1.5.4 or higher. You can disable this option, if required. You can also configure the external storage in Longhorn for ECS, and then initiate DRS automatic backups to it. Automatic backups (DRS) functionality is disabled by default on OCP clusters. For more information, see DRS automatic backups.
- Authentication for ingress TLS/SSL
- A new property (
ssl_private_key_password) is added to the Cloudera Manager to specify the password for the private key in the Ingress Controller TLS/SSL Server Certificate and Private Key file. - Improved Diagnostics
- The tez-site.xml file is now included in the Management Console diagnostic bundle download.
New features in Cloudera Data Engineering
- Support for password protected private key to initialize the virtual cluster
- You can now use the password protected private keys to initialize the virtual cluster. Currently, the password protected private keys are supported with RSA and EC algorithms only. For more information, see Initializing virtual clusters.
- Support for Spark Connect session (Preview)
- Cloudera Data Engineering supports Spark Connect sessions which are a type of CDE sessions that expose the Spark Connect interface and allows you to run spark commands from any remote Python environment.
New features in Cloudera Data Warehouse
- Hue supports natural language query processing (Preview)
- Hue leverages the power of Large Language Models (LLM) to help you generate SQL
queries from natural language prompts and also provides options to optimize, explain,
and fix queries, promoting efficient and accurate practices for accessing and
manipulating data. You can use several AI services and models such as OpenAI’s GPT
service, Amazon Bedrock, and Azure’s OpenAI service to run the Hue SQL AI assistant.
- To learn more about the supported models and services, limitations, and what data is shared with the LLMs, see About the SQL AI Assistant in Cloudera Data Warehouse.
- To set up and enable the SQL AI Assistant, see About setting up the SQL AI Assistant in Cloudera Data Warehouse.
- To see how to generate, edit, explain, optimize, and fix queries, see Starting the SQL AI Assistant in Hue.
- Improvements to custom pod configuration (now known as Resource Templates)
- Several improvements and changes have been made to the custom pod configuration
functionality starting with Cloudera Data Services on premises 1.5.4.
The custom pod configuration has been renamed to
Resource Templates
.- A new menu option Resource Templates has been added to the left navigation pane on the CDW web interface.
- You can now configure the allocation of Kubernetes resources to the pods for Hive, Data Visualization, and Database Catalog in addition to Impala.
- The Impala pod configuration feature is moved from the Environment Details page to the Resource Templates page.
- Flexibility to enable and disable quota management for Cloudera Data Warehouse entities
- Earlier, you were required to enable the quota management option before activating an environment to use quota-managed resource pools for environments, Data Catalogs, Virtual Warehouses, and Data Visualization instances in Cloudera Data Warehouse. Starting with Cloudera Data Services on premises 1.5.4, you can enable or disable the quota management option at any time during the lifecycle of the Cloudera Data Warehouse entities. To learn more about the behavioral aspects, see Quota Management in Cloudera Data Warehouse Private Cloud.
- Added support for authentication between Cloudera Data Warehouse and HMS database using mTLS
- Cloudera Data Warehouse and the Hive MetaStore (HMS) database on the base cluster can mutually authenticate each other during the SSL/TLS handshake using mTLS for all supported backend databases (Oracle, MySQL, MariaDB, and Postgres). To set up mTLS, you must upload the database client certificate and the private key files in PEM format while activating an environment in Cloudera Data Warehouse. See Enabling mTLS between the HMS database and Cloudera Data Warehouse on premises.
- Ability to enable active-passive configuration for HiveServer2 pods
- Cloudera Data Warehouse provides an option to enable active-passive configuration for HiveServer2 (HS2) pods in Private Cloud. By enabling this feature, two HS2 pods run simultaneously–one active and the other inactive. When one pod terminates, the inactive pod becomes active, providing High Availability. See HiveServer2 High Availability in Cloudera Data Warehouse Private Cloud. The most likely cause of a pod's termination is node failure.
- Cloudera Data Warehouse no longer has a dependency on YARN
- Environment activation in Cloudera Data Warehouse no longer depends on or fails if the YARN service is not installed on the Cloudera Base on premises cluster.
- Improvements to backup and restore Cloudera Data Warehouse
- There are two ways to create backups of the Data Warehosue service:
- Using Data Recovery Service (DRS)
- Using the Cloudera Data Warehouse’s CDP CLI cluster management commands
By default, Cloudera Data Warehouse backs up namespace-related data before starting the upgrade process using the Data Recovery Service (DRS). A new option called Back up Virtual Warehouse namespaces before an upgrade has been added to disable the automatic backup process on the Advanced Configuration page in the CDW web interface.
New features in Cloudera AI
- Cloudera AI Service Accounts are available in Cloudera AI Private Cloud
- In Cloudera AI, the Kerberos principal for the Service Account may not be the same as your login information. Therefore, ensure you provide the Kerberos identity when you sign in to the Service Account. For more information, see Authenticating Hadoop for Cloudera AI Service Accounts.
- Model Registry is available in Cloudera Data Services on premises
- Model Registry is now generally available (GA) in Cloudera Data Services on premises. Model Registry in Cloudera Data Services on premises uses Apache Ozone to store model artifacts. For creating a Model Registry you need the Ozone S3 gateway endpoint, the Ozone access key, and the Ozone secret key. If you deploy Model Registry in an environment that contains one or more Cloudera AI workspaces, you must synchronize the Model Registry with the workspaces. For more information, see Prerequisites for creating Model Registry and Synchronizing the model registry with a workspace.
- Heterogeneous GPU usage
- When using heterogeneous GPU clusters to run sessions and jobs, the available GPU accelerator labels need to be selected during workload creation. For more information, see Heterogeneous GPU clusters.
- Data connections without auto discovery
- Cloudera AI is a flexible, open platform, supporting connections to many data sources. The provided code samples demonstrate how to access local data for Cloudera AI workloads. For more information, see Connecting to Cloudera Data Warehouse.
- Spark Log4j Configuration
- Cloudera Machine Learning allows you to update Spark’s internal logging configuration on a per-project basis. Spark logging properties can be customized for every session, and job with a default file path found at the root of your project. You can also specify a custom location with a custom environment variable. For more information, see Spark Log4j Configuration.
- ML Metrics Collector service
- The Metrics Collector service gathers data about how users and groups use resource quota, like how much CPU, Memory and GPU capacity (if any) is allocated, and what the users or groups utilize from that. The Metrics Collector service is running by default, but to collect data about resource quota metrics, you need to enable the Quota Management feature. For more information, see ML Metrics Collector Service overview.
- Quota Management for group level (Preview)
- Quota Management Technical Preview (TP) release enables you to control how resources are allocated within your CML workspace on user and on group level. Yunikorn Gang Scheduling is also available, which is the default scheduling mechanism in Cloudera Machine Learning. For more information, see Quota Management overview and Yunikorn Gang Scheduling.
- Restarting a failed AMP setup
- You can now retry failed AMP deployment steps and continue the AMP setup to handle intermittent and configuration issues. For more information, see Restarting a failed AMP setup.
- New Hadoop CLI Runtime Addon versions are available
- The HadoopCLI 7.1.8.3-601 Runtime Addon is released for the Cloudera Data Services on premises.
