Cloudera Public Cloud: October 2024 Release Summary

The Release Summary of Cloudera Public Cloud summarizes major features introduced in Cloudera Management Console, Cloudera Data Hub, and data services.

Data Warehouse🔗

Review the fixed issues and changed behaviors in this hotfix release of Cloudera Data Warehouse on Public Cloud and learn how to apply this hotfix. For more information, see the Data Warehouse release notes.

Fixed issues🔗

This hotfix release of the Cloudera Data Warehouse service on Cloudera Public Cloud introduces a Cloudera Runtime fix.

DWX-18477: (Addendum) Merge query failures with reserved keywords as column names
If a reserved keyword is used as a column name in a query, you must escape the keyword by enclosing it within backticks (`). However, If you have merge queries involving Iceberg target tables that use reserved keywords as column names, the backticks are not retained when the merge queries are rewritten to a join query, and the query fails with a “SemanticException” error.

This issue is now fixed as part of an additional fix that was provided in HIVE-28282.

Known issues🔗

Review the known issues in this release of the Cloudera Data Warehouse (CDW) service on Public Cloud.

DWX-19451: Cloudera Data Visualization restore job can fail with ignorable errors
After a successful Cloudera Data Visualization restoration job, the Data Visualization restore job could be in a failed state with the log displaying ignorable errors.

pg_restore: error: could not execute query: ERROR:  sequence "jobs_joblog_id_seq" does not exist
Command was: DROP SEQUENCE public.jobs_joblog_id_seq;
pg_restore: error: could not execute query: ERROR:  table "jobs_joblog" does not exist
Command was: DROP TABLE public.jobs_joblog;
pg_restore: error: could not execute query: ERROR:  sequence "jobs_jobcontent_id_seq" does not exist
Command was: DROP SEQUENCE public.jobs_jobcontent_id_seq;
.......
.......

This issue occurs because the restore job issues commands to DROP all the objects that will be restored, and if any of these objects do not exist in the destination database, such ignorable errors are reported.

This has no functional impact on the restored Data Visualization application. It is noticed that all the backed up queries, datasets, connections, and dashboards are restored successfully and Data Visualization is available for new queries.
Workaround: None.

DWX-19454: Default Database Catalog does not start in an Azure transparent proxy, private link, and private AKS setup
If you are running a transparent proxy with a private link and private AKS setup in your Azure environment, the Data Warehouse server is unable to connect to the private AKS end point, and the default Database Catalog fails to start.
Workaround: None.

DataFlow🔗

This release (2.9.0-b383) of Cloudera DataFlow increases developer productivity through the introduction of Parameter Groups which can be shared between flow drafts. Developers can now also create NiFi 2.0 flows in the Designer leveraging new Cloudera exclusive processors for building RAG data pipelines. Deployments can now be configured with a Prometheus endpoint that allows scraping Apache NiFi metrics. Cloudera DataFlow’s service and deployment events and alerts now support in-app and email notifications.

Latest NiFi version
Flow Deployments and Test Sessions now support the latest Apache NiFi 1.27 release.

Build NiFi 2.0 flows in Flow Designer [Technical Preview]
You can now select NiFi 2.0 when creating drafts and start test sessions including the ability to configure your test session to use your custom Python based processors.

New Cloudera-exclusive AI processors for NiFi 2 [Technical Preview]
You can now implement RAG pipelines by using new processors to parse, chunk and vectorize data, bringing context to their LLMs. The following processors are now available with NiFi 2:

PartitionPdf, PartitionHtml, PartitionText, PartitionDocx, PartitionCsv
ChunkData
EmbedData
InsertToMilvus, LexicalQueryMilvus, VectorQueryMilvus
PutChroma, QueryChroma
PutOpenSearchVector, QueryOpenSearchVector

Bedrock Parameter Groups
You can now centrally define and manage parameter groups in a workspace and re-use them for multiple drafts, eliminating tedious copy-and-pasting of parameters and their values.

New Resources page
Users can now easily view and manage all their workspace resources like deployments, drafts, parameter groups, inbound connections, custom NAR/Python configurations in a single place.

Notifications via App and Email for Cloudera DataFlow service and deployment events
You can now receive real-time notifications for all events related to a Cloudera DataFlow Service and its deployments through the Cloudera Management Console, under the Notifications tab, and through email. For more information, see Setting up service and deployment notifications.

NiFi metrics can now be exposed via a Prometheus endpoint
You can now configure deployments to expose NiFi metrics through a Prometheus endpoint. Once set up, you can configure your Prometheus instances to scrape these endpoints, consume relevant metrics and build custom dashboards. For more information, see Configuring access for NiFi metrics scraping.

New ReadyFlows

ADLS to Pinecone
S3 to Pinecone
ADLS to Milvus
S3 to Milvus
RAG Query Milvus

New Kubernetes version support
Cloudera DataFlow now supports EKS/AKS 1.29

Changes and improvements🔗

As part of the upgrade process to Cloudera DataFlow 2.9.0, the Azure Postgres database is migrated from a single server to a flexible server deployment.
Improved asset handling for deployments makes deployment creation more robust in cases where many deployments are created at the same time.
Kubernetes scale up events could result in the Cloudera DataFlow application container being rescheduled causing Cloudera DataFlow to become unavailable. Additional restrictions for rescheduling the Cloudera DataFlow application were added to avoid downtime.
Dependencies have been updated to Java 21, Spring 6 and Spring Boot 3.

Fixed issues🔗

NiFi cluster failed to auto scale with a UDP inbound connection configured
NiFi node failed to start up due to custom Kubernetes cluster domain name
MiniFi logging failed to clean up a full content volume
Vault failed to start up due to insufficient wait in its postStart script
Auto scaling driven by flow metrics did not kick in

Cloudera AI🔗

Cloudera AI introduces the following features and fixes in October:

October 10, 2024🔗

Release notes and fixed issues for version 2.0.46-b210.

New Features / Improvements

Model Hub is generally available (GA): Model Hub is a catalog of top-performing models LLM and generative AI models. You can now easily import the models listed in the Model Hub into the Model Registry and then deploy it using the Cloudera AI Inference service. For more information, see Using Model Hub.
Cloudera AI Inference Enhancements:
- Added support for NVIDIA’s NIM profiles requiring for the L40S GPU models.
- Made auto-scale configuration which is rendered in UI during the creation of model endpoint user-friendly. (DSE-38845)
- Optimized AI Inference UI service to become more responsive.
- User actionable error messages are now rendered in Cloudera AI Inference service UI.
  For more information, see Using Cloudera AI Inference service.

Fixed Issues

Addressed scaling issues with web services to support high active user concurrency (DSE-39597).
CVE fixes - This release includes numerous security fixes for critical and high Common Vulnerability and Exposures (CVE).
Fixed CORS issue to ensure that DELETE/PATCH V1 API can be used from within a workspace. (DSE-39357)
Made the NGC service key used to download Nvidia’s optimized models more restrictive. (DSE-39475)
Previously, users were unable to copy the model-id from AI Inference UI. This issue is now resolved. (DSE-38889)
Authorization issues related to the listing of AI Inference applications have been addressed. (DSE-39386)
Fixed an issue to ensure that instance type validation is correctly carried out during the creation of a new model endpoint. (DSE-39634)
Added required validation rules for the creation of a new model endpoint. (DSE-38412)
Addressed an issue around empty model list during navigation from registry models to deployment of models. (DSE-39634)

October 8, 2024🔗

Release notes and fixed issues for Cloudera AI Inference service version 1.2.0-b73.

New Features / Improvements

Cloudera AI Inference: Cloudera AI Inference is now a fully supported data service. Cloudera AI Inference service is a production-grade serving environment for traditional, generative AI, and Large Language Models. It is designed to handle the challenges of production deployments, such as high availability, fault tolerance, and scalability. The service is now available to carry out inference on the following categories of models:
- Optimized open-source Large Language Models.
- Traditional machine learning models like classification, regression, and so on. Models need to be imported to the Cloudera Machine Learning Model Registry to be served using the Cloudera AI Inference.
  For more information, see Using Cloudera AI Inference service.

Machine Learning Runtime 2024.05.2🔗

Behavioral Changes

JupyterLab Real-Time Collaboration (RTC) plug-in is not installed by default.

Improvements

CVE fixes - This maintenance release includes numerous improvements related to Common Vulnerability and Exposures (CVE).

Machine Learning Runtime 2024.10.1🔗

New Feature

Cloudera Copilot is generally available (GA): Cloudera Copilot is a highly configurable AI coding assistant integrated with the JupyterLab editor. The Copilot improves developer productivity by debugging code, answering questions and generating notebooks.

Cloudera Copilot is available in the Runtimes with JupyterLab Editor.

For more information, see Using Cloudera Copilot.

New Runtimes

Python 3.12 PBJ Workbench
Python 3.12 PBJ NVIDIA GPU Workbench
Python 3.12 JupyterLab
Python 3.12 JupyterLab NVIDIA GPU

Note
Python 3.12 Runtimes are compatible only with Spark version 4.0.

Behavioral Changes

JupyterLab Real-Time Collaboration (RTC) plug-in is not installed by default.
On Workbench and Jupyterlab Runtimes, the await_workers function in the CDSW R and Python libraries will not time out when timeout_seconds is set to 0. Instead, the command will block until the workers are ready. Earlier, if the timeout_seconds set to 0 the function call timed out immediately. The new implementation matches the documented behavior of this function.

Improvements

Python maintenance versions - Python maintenance versions have been upgraded to 3.7.17, 3.8.19, 3.9.19, 3.10.14, and 3.11.9.
CVE fixes - This maintenance release includes numerous improvements related to Common Vulnerability and Exposures (CVE).
Update setup tools in Conda runtime to resolve CVE-2024-6345.

Cloudera Management Console🔗

This release of the Cloudera Management Console service introduces the following changes:

Azure Single Server to Azure Flexible Server upgrade

Single Server on Microsoft Azure databases used by Data Lakes and Data Hubs can now be upgraded to Azure Flexible Server. During the upgrade process from PosgtreSQL version 11 to PostgreSQL 14, Azure Single Server will be upgraded to Azure Flexible Server.
For more information, see Upgrading Azure Single Server to Flexible Server.

Compute Cluster enabled environments

The following changes have been introduced to compute cluster enabled environments:

You can provide the Worker Node Subnets for compute cluster enabled environments on AWS and Azure
You can provide the AKS Private DNS Zone ID for compute cluster enabled environments on Azure

For more information, see Using compute cluster enabled environments on AWS and Using compute cluster enabled environments on Azure.

New configuration property for non-transparent proxy

Inbound Proxy CIDR has been introduced for configuring non-transparent proxy in Public Cloud environments to allow communication with the Kubernetes server when defining the proxy with FQDNs.
For more information, see Using a non-transparent proxy.

Deprecated properties

The following properties have been deprecated:

AwsParameters (domain):
- s3guardTableName
- s3guardTableCreation
AwsParametersDto:
- dynamoDbTableName
- dynamoDbTableCreation
AwsParameterValidator
- determineAwsParameters()

Receiving resource notifications

Notifications include automatically generated service and resource related alerts, such as cluster state changes and events, upgrade alerts, resource exhaustion and consumption notifications. Notifications can be received by users of a tenant who have subscribed to the resource events of a Cloudera service.
For more information, see the Receiving notifications documentation.

Cloudera Observability🔗

This release of Cloudera Observability includes the following improvement:

Query Cost Analysis Query Cost Analysis is now enabled for the Impala queries in CDP Private Cloud Base and CDP Data Hub environments. You can use the query cost analysis to gain insights into costs associated with different resources used by the query.

For information, see Query and job resource optimization using resource efficiency analysis.