Cloudera on Cloud: April 2025 Release Summary

The Release Summary of Cloudera on cloud summarizes major features introduced in Management Console, Data Hub, and data services.

Cloudera AI

Cloudera AI 2.0.50-b52 introduces the following changes:

New Features / Improvements

Cloudera AI Workbench

Previously, when a job was already running and another job run was triggered by a cron job or an API call, the new run would be skipped and displayed as Failed in the UI. This update introduces a Skipped status, and any skipped job runs will now appear with the Skipped status in the UI.
The Shared Memory Limit set under Project Settings now applies to both applications and sessions. Previously, it was applied only for sessions.
Custom Spark settings can now be configured at Cloudera AI Workbench level. When set, the custom Spark configuration provided by the administrator will be merged with the default Spark configuration used in Cloudera AI sessions. These settings will automatically apply to all newly launched Spark sessions within the workbench. The configuration option is available under Site Administration > Runtimes.

Cloudera AI Platform

Cloudera AI Workbench now utilizes EBS SSD gp3 volumes for newly created or restored Cloudera AI Workbench instances, replacing the previously used EBS SSD gp2 volumes.
Added support for Sweden Central on Azure.
Support for manually modified PVC sizes.
Added support for EKS 1.31.
Added support for AKS 1.31.

Cloudera AI Registry

The Cloudera AI UI now displays clear error messages for failed Model Imports, enabling quicker troubleshooting.
Users without the appropriate roles now see actionable error messages in the Model Hub popup.
Load Balancer Subnet option is added during the AI Registry creation.
The Cloudera AI UI now supports the force deletion of AI Registry.

Important
You must upgrade AI Registry after consuming the latest release to ensure compatibility with the most recent models available in our Model Hub. For information about AI Registry upgrade, see, Upgrading Cloudera AI Registry.

Cloudera AI Inference service

Informative tooltips have been added to the Create Model Endpoints page to improve the user experience.
Cloudera AI Inference service can now be created without the need for a node group.

ML Runtimes

Project template files updated to fully support ML Runtimes. Project template files no longer work with Legacy Engines.

For more information about the Known issues, Fixed issues and Behavioral changes, see the Cloudera AI Release Notes.

Cloudera Data Catalog

Cloudera Data Catalog 3.1.0 introduces the following new changes:

Improved services for profilers

Thanks to the improved Cluster Setup API, the configuration of profilers is simplified
- Executor related settings only specify the maximum number of workers, an internal service manages the autoscaling within this range

Redesigned profiler setup

Settings for instance sizing and autoscaling are introduced

Improved profiler UI
The improved profilers present a more user friendly UI and several extended capabilities for Compute Cluster enabled environments.

New names for profilers in Compute Cluster enabled environments:
- The Cluster Sensitivity Profiler is now called Data Compliance profiler.
- The Hive Column Profiler is now called Statistics Collector profiler.
- The Ranger Audit Profiler is now called Activity Profiler.
Redesigned Profilers menu for easier access to jobs, configurations and their history, asset filtering and tag rules:
- The individual profilers show new metrics
  - Number of profiled assets of the last job
  - Job duration of the last job
  - The profilers menu also shows the next jobs’ start time and the number of completions
- The CRON expression based scheduler is supplemented with a natural language based scheduler
- Asset Filtering Rules is expanded with the list of assets affected by your rule set
- You can now access the Configuration History of a profiler, where you can check your changes in a sequential order
- The Job Summary page is introduced new metrics:
  - Workers details:
  - Worker Memory limit
  - Threads per workers
  - Number of workers
  - Last run check details
- The Job Summary page provides the list of profiled assets.

Redesigned and expanded Tag Rules for Compute Cluster enabled environments

Profiling table names is introduced next to column values or column names.
Atlas classifications (Cloudera Data Catalog tags) can be used in a more granular way thanks to the distinction between parent and child tags.
Tag rules are data lake specific in Compute Cluster enabled environments compared to being valid for all data lakes in VM-based environments.
The new Tag Rules tab offers filters to allow for faster searching and displays:
- List of applied parent and child tags
- Tag rule status (Can be used to filter for tag rules not yet validated by Dry Run)
- Rule types
- You can filter for tag rules that apply child tags
The initial loading time of rules has been decreased.
You can upload regex patterns in CSV files for easier handling.
Now you can specify weightage for column value based matching (which was fixed at 85% before). The column weightage and column name weightage add up to 100%.
When profiling column values, you can upload a sample set of column values instead of defining a regex pattern.
You can review your configuration before finalizing your tag rule.
Dry Run: Before deploying your tag rules, you have to test them with actual table data.
New API calls are available.

New file formats for Compute Cluster based profilers
Compute Cluster based profilers also support the ORC and Avro file format.

For more information about the Fixed issues, Known issues and Behavioral changes, see the Cloudera Data Catalog Release Notes.

Cloudera Data Engineering

Cloudera Data Engineering 1.23.1-H2 does not introduce new features, but includes fixes. For more information, see the Cloudera Data Engineering Release Notes.

Cloudera Data Hub

The latest version of Cloudera Data Hub introduces the following changes:

Cloudera Data Engineering support on ARM - Technical Preview
With the Cloudera Runtime 7.3.1.200 Service Pack 1 (SP1) release, Cloudera Data Engineering Data Hub is optimized to run ARM-based solutions enabling users to run large-scale data processing and analytics workloads. From the AWS Graviton family, Cloudera Data Engineering Data Hub is supported on AWS Graviton2, Graviton3 and Graviton4.

AWS Graviton is a general purpose, ARM-based processor family. AWS Graviton offers enhanced performance and cost-efficiency compared to traditional Intel x86 processors. With AWS Graviton, you can optimize costs and achieve better performance for cloud workloads running in AWS Elastic Compute Cloud (EC2). For more information, see the AWS Instance Type page.

For more information about how to create Cloudera Data Engineering Data Hub clusters on ARM processors, see the Creating a cluster from a definition on AWS documentation.

The following limitations are applied when using Cloudera Data Engineering Data Hub on ARM-based architecture:

You need to have the following entitlement to be able to create Cloudera Data Engineering Data Hub clusters on ARM:
- CDP_AWS_ARM_DATAHUB
  For more information about how to obtain the entitlement, contact Cloudera Customer Support.
You can create ARM-based Cloudera Data Engineering Data Hub clusters only in Cloudera environments on AWS.
Ensure that the EC2 instances with ARM processors are supported in your region. For more information, see the AWS Graviton Processors and Amazon EC2 instance types by Region documentation.
From the available Cloudera Data Hub templates, only the Data Engineering templates are supported on the ARM-based architecture.
To create Cloudera Data Engineering Data Hub clusters that use ARM processors, you need to install beta CDP CLI. For more information, see the Installing beta CDP CLI documentation.

Secret rotation (Preview)
To add extra measures of security, you can rotate secrets, like database passwords or FreeIPA admin password using CLI commands. These secrets are managed and created by the Cloudera Control Plane. By using the following commands the secrets can be rotated to achieve more secure deployments.

For more information, see Secret rotation.

Cloudera Data Flow

Release 2.10.0-b443 of Cloudera Data Flow makes NiFi 2 generally available for flow development and flow deployments, and provides a semi-automatic tool to migrate NiFi 1.x flows to NiFi 2.x. It also provides security enhancements in the form of role-based access control to flow definitions in the Catalog and specifying trusted IP addresses for inbound connections. Flow deployment is further streamlined by the availability of shared parameters and made more cost-efficient by customizable storage sizing.

New features

Latest NiFi version
Flow Deployments and Test Sessions now support the latest Apache NiFi 1.28 and NiFi 2.3 releases. You can now develop flows in the Flow Designer as NiFi 2.3 flows by default. NiFi 2.3 is also available as the NiFi 2 runtime for Flow Deployments. This marks the general availability of NiFi 2 in Cloudera Data Flow.

NiFi 2 migrations
Self-service migration is powered by Cloudera Data Flow Catalog and Flow Designer. You can organize NiFi 1 flows in the Catalog, start migrations with one click, and make any required changes in Flow Designer. A comprehensive visual migration report clearly highlights items that require manual updates, while enabling users to effectively keep track of their progress.

For more information, see Migrating to NiFi 2.x.

Shared Parameter Groups in flow deployments
You can accelerate deployment processes by importing and referencing Shared Parameter Groups during flow deployment. This streamlined workflow significantly reduces development and deployment complexity and accelerates time to value for users by eliminating manual copy&paste of parameter values.

For more information, see Shared parameters.

Custom storage sizing
You can now specify tailored storage capacity, IOPS, and throughput sizes for your NiFi repositories, making smaller deployments more cost-efficient. For more information, see Configuring sizing and scaling.

Access control for flow definitions
Collections enhance security by enabling precise role based access control for cataloged flows. You can organize cataloged flows into Collections and tightly manage user access to each Collection.

Note
Available May 9, 2025.

For more information, see Collections.

Secured inbound connections
You can now specify trusted IP addresses for flows with inbound connections. This limits traffic to only the specified IP addresses.

For more information, see Create an Inbound Connection Endpoint.

Better notifications
You can now stay better informed with enhanced, customizable notifications.

For more information, see Service notifications and Deployment notifications.

New ReadyFlows

ADLS to Chroma DB
S3 to Chroma DB
Slack to Chroma DB
Slack to Milvus
ADLS to Qdrant
S3 to Qdrant
Slack to Qdrant
ADLS to OpenSearch
S3 to OpenSearch
Slack to OpenSearch
RAG Query Pinecone
Available ReadyFlows.

Platform updates

New Kubernetes version support
Cloudera Data Flow now supports EKS/AKS 1.31.

Adopted Chainguard base image for all in-house components
All Cloudera Data Flow components are now built based on Chainguard base images to minimize CVEs

Migrated third party container images to Chainguard
90% of third party container images used in Cloudera Data Flow are now sourced from Chainguard to minimize CVEs.

Changes and improvements

Storage size reserved for NiFi 2.x deployments has been reduced.
Redis has been replaced by Valkey.
In flow Designer, local parameter group assets are now deleted if applicable, when the parameter group is updated.
In Flow Designer, when importing shared parameter groups to flow drafts, there is now an option to automatically remove overlapping local parameters.
More accurate test session status in Flow Designer
Messaging and marking of required fields on the UI were improved
Various UI accessibility improvements

For more information about the Known issues, Fixed issues and Behavioral changes, see the Cloudera Data Flow Release Notes.

Cloudera Data Warehouse

Cloudera Data Warehouse 1.10.1-b703 introduces the following changes:

Important
This release contains changes that required upgrading associated components. For example, the Hive Metastore (HMS) API versions are updated and require you to upgrade the Database Catalog that is associated with your Virtual Warehouses. You may notice that the latest Virtual Warehouse versions do not show up until you upgrade the Database Catalog.

What’s new in Cloudera Data Warehouse on cloud

Improvements to Impala Autoscaler Dashboard - view historical data
You can now view historical autoscaler metrics data for a specified period of time by choosing the Historic Data option and specifying the start and end timestamps for which you want to view the data. Note that this feature is currently available only for AWS environments.

For more information, see About Impala Autoscaling Dashboard.

Publishing Cloudera Data Warehouse telemetry data in Cloudera Observability
While activating an AWS or Azure environment in Cloudera Data Warehouse, the global option that is set in Cloudera Management Console through Environments > Summary > Telemetry > Cloudera Observability - Workload Analytics is considered to decide if diagnostic information about job and query execution should be sent to Workload Manager.

If the Cloudera Observability - Workload Analytics option is enabled, Cloudera Data Warehouse publishes Hive or Impala query data to Cloudera Observability and if the option is disabled, users do not see any diagnostic data related to their queries.

Note
This change only affects new Environments that are activated. Existing cluster instances continue to publish diagnostic data until the Environment is reactivated. Any change with this option is only considered by Cloudera Data Warehouse when the Environment is reactivated.

Removal of docker custom registry type
Starting from this release, the “docker” custom image registry type is no longer supported in Cloudera Data Warehouse and the option to choose the “docker” registry type during environment activation is removed. Cloudera Data Warehouse only supports the ACR and ECR image registries.

Security improvement: use of Chainguard images
To enhance security, Cloudera Data Warehouse now uses Chainguard hardened images for its base images, Hue, and third-party images. The Kubernetes Dashboard is excluded from this change.

These changes help us address CVEs and offer improved security and stability. For more information, see Chainguard container images.

What’s new in Hive on Cloudera Data Warehouse on cloud

OpenTelemetry integration for Hive
Hive now integrates with OpenTelemetry (OTel) to enhance query by collecting and exporting telemetry data, including infrastructure and workload metrics. An OTel agent in Cloudera Data Warehouse helps monitor query performance and troubleshoot failures. For more information, see OpenTelemetry support for Hive.

Apache Jira: HIVE-28504

Common table expression detection and rewrites using cost-based optimizer
Hive’s existing shared work optimizer detects and optimizes common table expressions heuristically, but it lacks cost-based analysis and has limited customization. Introduced new APIs and configuration options to support common table expression optimizations at the cost-based optimizer level. The feature is experimental and disabled by default.

Apache Jira: HIVE-28259

Upgraded Avro to version 1.11.3

What’s new in Impala on Cloudera Data Warehouse on cloud

Improved Cardinality Estimation for Aggregation Queries
Impala now provides more accurate cardinality estimates for aggregation queries by considering data distribution, predicates, and tuple tracing. Enhancements include:

Pre-aggregation Cardinality Adjustments: A new estimation model accounts for duplicate keys across nodes, reducing underestimation errors.
Predicate-Aware Cardinality Calculation: The planner now considers filtering conditions on group-by columns to refine cardinality estimates.
Tuple Tracing for Better Accuracy: Improved tuple analysis allows deeper tracking across views and intermediate aggregation nodes.
Consistent Aggregation Node Stats Computation: The planning process now ensures consistent and efficient recomputation of aggregation node statistics. These improvements lead to better memory estimates, optimized query execution, and more efficient resource utilization.
Tuple-Based Cardinality Analysis: Analyzing grouping expressions from the same tuple to ensure their combined number of distinct values does not exceed the output cardinality of the source PlanNode, reducing overestimation.
Refined number of distinct values Calculation for CPU Costing: The new approach applies a probabilistic formula to a single global NDV estimate, improving accuracy and reducing overestimation in processing cost calculations.

Apache Jira: IMPALA-2945, IMPALA-13086, IMPALA-13465, IMPALA-13526, IMPALA-13405, IMPALA-13644

Cleanup of host-level remote scratch dir on startup and exit
Impala now removes leftover scratch files from remote storage during startup and shutdown, ensuring efficient storage management. The cleanup targets files in the host-specific directory within the configured remote scratch location.

A new flag, remote_scratch_cleanup_on_start_stop, controls this behavior. By default, cleanup is enabled, but you can disable it if multiple Impala daemons on a host or multiple clusters share the same remote scratch directory to prevent unintended deletions.

Apache Jira: IMPALA-13677, IMPALA-13798

Graceful shutdown with query cancellation
Impala now attempts to cancel running queries before reaching the graceful shutdown deadline, ensuring resources are released properly. The new shutdown_query_cancel_period_s flag controls this behavior. The default value is 60 seconds. If set to a value greater than 0, Impala will try to cancel running queries within this period before forcing shutdown. If the value exceeds 20% of the total shutdown deadline, it is automatically capped to prevent excessive delays. This approach helps prevent unfinished queries and unreleased resources during shutdown.

For more information, see Setting Impala Query Cancellation on Shut down.

Programmatic query termination
Impala now supports the KILL QUERY statement, enabling you to forcibly terminate queries for better workload management. The KILL QUERY statement cancels and unregisters queries on any coordinator.

For more information, see KILL QUERY statement.

Ability to log and manage Impala workloads is now GA
Cloudera Data Warehouse provides you the option to enable logging Impala queries on an existing Virtual Warehouse or while creating a new Impala Virtual Warehouse. The information for all completed Impala queries is stored in the sys.impala_query_log system table. Information about all actively running and recently completed Impala queries is stored in the sys.impala_query_live system table. Users with appropriate permissions can query this table using SQL to monitor and optimize the Impala engine.

For more information, see Impala workload management.

AI Functions in Impala is now GA
Cloudera Data Warehouse introduces Impala’s built-in ai_generate_text function that integrates Large Language Models (LLMs) into SQL for tasks such as sentiment analysis and translation. It simplifies workflows, requires no ML expertise, and supports default or custom UDF configurations.

Secure API key storage is supported through a JCEKS keystore. A lightweight tool included in the UDF SDK helps create or update keystores on Amazon S3 or Azure ABFS without a local Hadoop setup.

For more information, see Advantages and use cases of Impala AI functions.

What’s new in Iceberg on Cloudera Data Warehouse on cloud

Cloudera support for Apache Iceberg version 1.5.2
The Apache Iceberg component has been upgraded from 1.4.3 to 1.5.2.

Reading Iceberg Puffin statistics
Impala supports reading Puffin statistics from current and older snapshots. When there are Puffin statistics for multiple snapshots, Impala chooses the most recent statistics for each column. This indicates that statistics for different columns may come from different snapshots. If there are Hive Metastore (HMS) and Puffin statistics for a column, the most recent statistics are considered. For HMS statistics, the impala.lastComputeStatsTime property is used and for Puffin statistics, the snapshot timestamp is used to determine which among the two is the most recent.

For more information, see Iceberg Puffin statistics.

Note
Reading Puffin statistics is disabled by default. Set the --enable_reading_puffin_stats startup flag to "true" to enable it.

Enhancements to Iceberg data compaction
The OPTIMIZE TABLE statement is enhanced with the following improvements:

Supports partition evolution
The Hive and Impala OPTIMIZE TABLE statement that is used to compact Iceberg tables and optimize them for read operations, is enhanced to support compaction of Iceberg tables with partition evolution.
Supports data compaction based on file size threshold
The Impala OPTIMIZE TABLE statement has been enhanced to include a FILE_SIZE_THRESHOLD_MB option that enables you to specify the maximum size of files (in MB) that should be considered for compaction.

For more information, see Iceberg data compaction.

Impala supports the MERGE INTO statement for Iceberg tables
You can use Impala to run a MERGE INTO statement on an Iceberg table based on the results of a join between a target and source Iceberg table.

For more information, see the Iceberg Merge feature.

What’s new in Hue on Cloudera Data Warehouse on cloud

General availability of deploying a shared Hue service
Cloudera Data Warehouse now supports the deployment of a shared Hue service, enabling cost-efficient management by ensuring that only the necessary Virtual Warehouses remain active. Organizations can enhance team isolation by running multiple shared Hue instances, providing flexibility and control. The shared Hue service remains available as long as the environment is active.

For more information, see About deploying the shared Hue service.

Hue SQL AI: Multi database querying now supported
The Hue SQL AI Assistant now supports multi-database querying, allowing you to retrieve data from multiple databases simultaneously. This enhancement simplifies managing large datasets across different systems and enables seamless cross-database queries.

Support for cross-database queries.
Ability to retrieve and combine data from multiple sources in a single query.

For more information, see Multi database support for SQL query.

User Input Validation for Hue SQL AI
Hue SQL AI now supports secure and optimized integration with large language models (LLMs). You can now configure user input validation, such as prompt length limits, regex restrictions, and HTML tag handling, and more to enhance both security and system performance.

For more information, see User Input Validation for Hue SQL AI.

For more information about the Fixed issues, Known issues and Behavioral changes, see the Cloudera Data Warehouse Release Notes.

Cloudera Management Console

The latest version of Cloudera Management Console introduces the following changes:

Creating new network for AWS environments is removed
The option to create a new network when registering a Cloudera environment on AWS is no longer available. To register a Cloudera environment on AWS, you can use your existing VPC and subnets already available in AWS.

This change impacts the following documentations:

For quickly deploying Cloudera environments on AWS, see the Deploy Cloudera using Terraform documentation.

For more information, see Secret rotation.

Cloudera Operational Database

The Cloudera Operational Database 1.50 version supports AWS Graviton-based cluster creation and enhancements to the Cloudera Operational Database UI.

AWS Graviton support is generally available
Cloudera Operational Database cluster deployments on AWS Graviton environments are now generally available. AWS Graviton is a family of general-purpose, ARM-based processors designed for cloud workloads. Cloudera Operational Database supports the following Graviton versions:

Graviton2: Used in Cloudera Operational Database deployments on I4G instances
Graviton4: Used in Cloudera Operational Database deployments on I8G instances

AWS Graviton processors deliver exceptional price performance for workloads running on AWS EC2. With Graviton4, you can further optimize costs while achieving superior performance. For additional details, see the AWS press release.

To help you choose the most suitable ARM processor for your performance requirements, consult the AWS Graviton Processors documentation, which provides a detailed comparison between Graviton2 and Graviton4 instances.

For more information on AWS Graviton support, see AWS Graviton instances in Cloudera Operational Database.

Enhancements to the Cloudera Operational Database UI
Cloudera Operational Database UI is updated for better usability and performance. The following are the key enhancements.

In the Cloudera Operational Database UI, you can view the snapshots created for a database in the Databases > DATABASE_NAME > Snapshots tab.
The Collect Diagnostic Bundle option is moved from the Actions menu item to the Diagnostic Bundles tab on the database details page.

For more information about the Fixed issues, Known issue, and Behavioral changes, see the Cloudera Operational Database Release Notes.