Cloudera on Cloud: October 2025 Release Summary
The Release Summary of Cloudera on cloud summarizes major features introduced in Management Console, Data Hub, and data services.
Cloudera AI
Cloudera AI 2.0.53-b243 introduces the following changes:
New Features / Improvements
Model Hub
- Improved model information display by surfacing attributes like MaxTokens, Parameters, and Dimensions for Embedding Models, enabling better decision-making before importing models.
Cloudera AI Inference service - Added the ability for users to start and stop deployed model endpoints, providing greater control over resource management and cost optimization. This feature allows you to pause inactive endpoints to save resources while maintaining the ability to quickly restart them when needed.
- Improved user experience by enabling customers to open model endpoints in different browser tabs, allowing for better multitasking and simultaneous monitoring of multiple endpoints.
- Enhanced API accessibility by providing customers access to the Swagger UI for invoking AI Inference APIs, enabling interactive API testing and documentation exploration.
- Added visibility to the underlying compute cluster through a direct link in the UI for AI Inference instances, providing seamless navigation to cluster details and infrastructure monitoring.
- Implemented validation for root volume size when adding new nodes to AI Inference deployments, preventing configuration errors and ensuring adequate storage capacity.
- Enhanced node group management by displaying GPU types and enabling filtering based on GPU type when searching through available nodes, improving resource selection efficiency.
- Upgraded KServe from version 0.12 to 0.15, enhancing the underlying model serving infrastructure.
Cloudera AI Platform - Enhanced support for self-signed certificates in public cloud deployments, resolving installation and update pain points. For more information, see Managing Certificate Authority certificates.
- Re-enabled cadence workflow for workbench upgrades, fixing version compatibility issues.
Cloudera AI Workbench - Optimized the loading speed of the Site Administration overview and the Projects pages by improving cluster-wide resource usage data collection, ensuring quick loading even in environments with 1,000+ user namespaces.
- The Job Retry feature introduces automated recovery for failed, timed out, or skipped jobs. Administrators can now set concurrent retry limits and customize the retry behavior to enhance job resilience and eliminate the need for manual failure intervention. For more information, see Configuring Job Retry settings.
For more information about the Known issues, Fixed issues, and Behavioral changes, see the Cloudera AI Release Notes.
Cloudera Data Catalog
Cloudera Data Catalog 3.1.3 introduces the following new changes:
- The Cluster Sensitivity Profiler and the Statistics Collector Profiler support incremental profiling to reduce required time and compute resources during repeated profiling jobs in Compute Cluster enabled environments.
For more information, see: - The new Asset Filtering Rules tab in Job Summary shows the relevant Allow and Deny list rules for each Data Compliance and Statistics Collector Profiler job.
- Bug fixes and improvements.
For more information about the Known issues and Fixed issues, see the Cloudera Data Catalog Release Notes.
Cloudera Data Hub
The October release of Cloudera Data Hub introduces the following change:
OS upgrades as part of Cloudera Data Hub upgrades
Previously, the Cloudera Data Hub upgrade process consisted of upgrading Cloudera Runtime, which was followed by an Operating System (OS) upgrade.
From October 31, 2025, this default behavior has changed and the OS upgrade is automatically triggered after the Cloudera Runtime upgrade as part of major/minor version and service pack upgrades.
For more information, see the Performing a Upgrading Cloudera Data Hub clusters documentation.
Cloudera Data Warehouse
Cloudera Data Warehouse 1.11.1-b927 introduces the following changes:
What’s new in Cloudera Data Warehouse on cloud
Azure AKS 1.33 upgrade
Cloudera supports the Azure Kubernetes Service (AKS) version 1.33. In 1.11.1 (released October 22, 2025), when you activate an Environment, Cloudera Data Warehouse automatically provisions AKS 1.33. To upgrade to AKS 1.33 from a lower version of Cloudera Data Warehouse, you must backup and restore Cloudera Data Warehouse.
Note
Using the Azure CLI or Azure portal to upgrade the AKS cluster is not supported and might result in cluster instability or downtime. For more information about upgrading, see [Upgrading an Azure Kubernetes Service (AKS) cluster](https://docs.cloudera.com/data-warehouse/cloud/azure-environments/topics/dw-azure-update-aks-kubernetes-version.html).
Integration of Hive ARM architecture support
Cloudera Data Warehouse now includes full operational support for Hive Virtual Warehouses on ARM architecture instance types. This confirms that Hive workloads run natively on both AWS and Azure instances currently supported by the platform, offering immediate access to the associated cost and performance efficiencies. For more information, see Compute instance types.
What’s new in Hive on Cloudera Data Warehouse on cloud
Upgrading Calcite
Hive has been upgraded to Calcite version 1.33. This upgrade introduces various query optimizations that can improve query performance.
Hive on ARM Architecture
Hive is now fully supported on ARM architecture instances, including AWS Graviton and Azure ARM. This enables you to run your Hive workloads on more cost-effective and energy-efficient hardware.
What’s new in Impala on Cloudera Data Warehouse on cloud
Enable global admission controller
It is now a standalone service that maintains a consistent view of cluster resource usage and can make admission decisions without risking over-admission. This feature is enabled by default for new Impala Virtual Warehouses running in High Availability (HA) Active-Active mode. If needed, you can disable it through the Cloudera web interface, but this action is permanent. For more information, see Impala admissiond and Configuring admission control.
Impala AES encryption and decryption support
Impala now supports AES (Advanced Encryption Standard) encryption and decryption to work better with other systems. AES-GCM is the default mode for strong security, but you can also use other modes like CTR, CFB, and ECB for different needs. This feature works with both 128-bit and 256-bit keys and includes checks to keep your data safe and confidential. For more information see AES encryption and decryption support
Apache Jira: IMPALA-13039
Query cancellation supported during analysis and planning
This new feature allows you to cancel Impala queries even while they are in the Frontend stage, which includes analysis and planning. Previously, you could not cancel a query while it was waiting for operations like loading metadata from the Catalog Server. With this update, Impala now registers the planning process and can interrupt it to cancel the query.
Apache Jira: IMPALA-915
Improved memory estimation and control for large queries
Impala now uses a more realistic approach to memory estimation for large operations like SORT, AGGREGATION, and HASH JOIN.
Previously, these operations could severely overestimate their memory needs (sometimes requesting terabytes) when row counts were misestimated. This often caused the Admission Controller to reject the query outright, even though the operation could easily handle the data by writing (spilling) to disk.
The system now considers the operator’s ability to spill data to disk and caps the memory estimate based on your cluster’s actual memory limits (like MEM_LIMIT). This change makes more large queries admittable and allows them to run successfully.
New Control Option
A new query option, MEM_ESTIMATE_SCALE_FOR_SPILLING_OPERATOR (a scale from 0.0 (exclusive) to 1.0 (inclusive)), is introduced, giving you control over this behavior:
- Higher values (closer to 1.0): Tells Impala to reserve more memory, increasing the chance of faster, in-memory execution.
- Lower values (closer to 0.0, but NOT 0.0): Tells Impala to request less memory, increasing the chance your query is admitted, but potentially leading to more spilling to disk (slower execution).
- The default value is 0.0 which disables this feature, and revert Impala planner to old behavior.
Apache Jira: IMPALA-13333
Expose query cancellation status to UDF interface
Impala now exposes the query cancellation status to the User-Defined Function (UDF) interface. This new feature allows complex or time-consuming UDFs to periodically check if the query has been cancelled by the user. If cancellation is detected, the UDF can stop its work and fail fast.
This significantly reduces the time you have to wait to stop a long-running query that is stuck inside a UDF evaluation.
Apache Jira: IMPALA-13566
CDPD-76276: Auto-optimized parquet collection queries
Impala now automatically boosts query performance for tables with collection data types by setting the parquet_late_materialization_threshold to 1 when data can be skipped during filtering. This ensures maximum efficiency by reading only the data needed.
For more information, see Late Materialization of Columns
Apache Jira: IMPALA-3841
Impala now supports Hive’s legacy timestamp conversion to ensure consistent interpretation of historical timestamps
When reading Parquet or Avro files written by Hive using legacy timestamp conversion, Impala’s timezone calculation for UTC timestamps could be incorrect, particularly for historical dates and timezones like Asia/Kuala_Lumpur or Singapore before 1982. This meant the timestamps displayed in Impala were different from those in Hive.
This issue is addressed by Impala now checks for the writer.zone.conversion.legacy flag in the Parquet file metadata to determine if Hive’s legacy timestamp conversion method was used. If found, Impala uses a compatible conversion method. For older files without this flag, a new session and query option, use_legacy_hive_timestamp_conversion, has been added to control the conversion method. See, Impala Query Options
Apache Jira: IMPALA-13627
CDPD-82251: Impala-shell now shows row count and elapsed time for most statements in HiveServer2 mode
When running Impala queries, some commands over HiveServer2 protocol (like REFRESH or INVALIDATE) did not show the Fetched X row(s) in Ys output in Impala-shell, even though Beeswax protocol showed them.
This issue was resolved by adding a new option in Impala-shell called --beeswax_compat_num_rows. When this option is enabled, Impala-shell now prints Fetched 0 row(s) in along with the elapsed time for all Impala commands. This requires impala-shell 4.5.1a1 or higher. See, Impala Shell Options - beeswax_compat_num_rows
Apache Jira: IMPALA-13584
CDPD-84069: Support for arbitrary encodings in text and sequence files
Impala now supports reading from and writing to Text and Sequence files that use arbitrary character encodings, such as GBK, beyond the default UTF-8.
Impala now recognizes the Hive table property “serialization.encoding” used with LazySimpleSerDe.
- When reading data from these files, Impala uses the specified encoding to correctly decode the string data (example, converting GBK bytes into readable characters).
- When inserting data into these tables, Impala correctly encodes the inserted strings into the specified charset before saving them to the text file.
For more information, see Impala TEXTFILE Data Files.
Apache Jira: IMPALA-10319
Expanded compression levels for ZSTD, and ZLIB
Impala has extended the configurable range of compression levels for ZSTD, and ZLIB (GZIP/DEFLATE) codecs. This enhancement allows for better optimization of the trade-off between compression ratio and write throughput.
ZSTD: Supports a wider range, including negative levels, up to 20.ZLIB(GZIP, DEFLATE): Supports levels from 1 (default) to 9 (best compression).
These levels are applied via the compression_codec query option.
JIRA Issue: IMPALA-13923
IMPALA-12992: Impala now supports tables created with the Hive JDBC Storage handler
Previously, Impala had difficulty reading tables created using the Hive JDBC Storage handler due to differences in how table properties, such as JDBC driver and DBCP configurations, were defined compared to Impala-created tables.
Impala now translates and adapts the Hive JDBC table properties upon loading, making them compatible with Impala’s internal requirements. See, Impala External JDBC Tables (Preview)
Apache Jira: IMPALA-12992
IMPALA-10349: Constant folding is now supported for non-ASCII and binary strings
Previously, the query planner could not apply the optimization known as constant folding if the resulting value contained non-ASCII characters or was a non-UTF8 binary string. This failure meant that important query filters could not be simplified, which prevented key performance optimizations like predicate pushdown to the storage engine (e.g., Iceberg or Parquet stat filtering).
The planner is updated to correctly handle and fold expressions resulting in valid UTF-8 strings (including international characters) and binary byte arrays. This allows Impala to push down more filters, significantly improving the performance of queries that use non-ASCII string literals or binary data in their filters. JIRA Issue: IMPALA-10349
Catalogd and Event Processor Improvements
- Faster Inserts for Partitioned Tables (IMPALA-14051): Inserting data into very large partitioned tables is now much faster. Previously, Impala communicated with the Hive Metastore (HMS) one partition at a time, which was a major slowdown. Impala now uses the batch insert API to send all insert information to the HMS in one highly efficient call, significantly boosting the performance of your
INSERTstatements into transactional tables. - Quicker Table Administration (IMPALA-13599): Administrative tasks, such as running
DROP STATSor changing theCACHEDstatus of a table, are now much faster on tables with many partitions. Impala previously made thousands of individual calls to the HMS for these operations. The system now batches these updates, making far fewer calls to the HMS and speeding up these essential administrative commands. - Reliable Table Renames (IMPALA-13989): The
ALTER TABLE RENAMEcommand no longer fails when anINVALIDATE METADATAcommand runs at the same time. Previously, this caused the rename to succeed in the Hive Metastore but fail in Impala’s Catalog Server. Impala now includes automatic error handling that instantly runs an internal metadata refresh if the rename is interrupted, ensuring the rename completes successfully without requiring any manual user steps. - Efficient Partition Refreshes (IMPALA-13453): Running
REFRESH <table> PARTITION <partition>is now much more efficient. Previously, this command always fully reloaded the partition’s metadata and column statistics, even if the partition was unchanged. Impala now checks if the partition data has changed before reloading, avoiding the unnecessary drop-add sequence and significantly improving the efficiency of partition metadata updates. - Reduced Partition API Calls (IMPALA-13599): Impala has reduced unnecessary API interactions with the HMS during table-level operations. Commands like
ALTER TABLE... SET CACHED/UNCACHEDorDROP STATSon large tables previously generated thousands of singlealter_partition()calls. Impala now utilizes the HMS’s bulk-update functionality, batching these partition updates to drastically reduce the total number of required API calls. REFRESHon multiple partitions (IMPALA-14089): Impala now supports using theREFRESHstatement on multiple partitions within a single command, which significantly speeds up metadata updates by processing partitions in parallel, reduces lock contention in the Catalog service, and avoids unnecessary increases to the table version. See Impala REFRESH Statement- Impala cluster responsiveness during table renames(IMPALA-13631): This ensurs that the critical internal lock is no longer held during long-running external calls initiated by
ALTER TABLE RENAMEoperations. This prevents the entire Impala cluster from being blocked, allowing other queries and catalog operations to proceed without interruption.
Apache Jira: IMPALA-14051, IMPALA-13599, IMPALA-13989, IMPALA-13453,IMPALA-14089 , IMPALA-13631
New query options for reliable metadata synchronization
Impala now offers new query options to give you a reliable way to ensure your queries run with the latest table data after the relative HMS modifications are done.
SYNC_HMS_EVENTS_WAIT_TIME_S: Sets the maximum time, in seconds, you are willing to wait for the metadata synchronization from the Hive Metastore (HMS). Setting this enables Impala to pause query compilation automatically until changes are applied, ensuring metadata consistency.SYNC_HMS_EVENTS_STRICT_MODE: Controls error handling if the wait time is exceeded. By default, Impala proceeds with a warning. Set toTRUEto force the query to fail immediately, guaranteeing strict consistency.
See, Impala Query Options
JIRA Issue: IMPALA-12152
What’s new in Iceberg on Cloudera Data Warehouse on cloud
Integrate Iceberg scan metrics into Impala query profiles
Iceberg scan metrics are now integrated into the Frontend section of Impala query profiles, providing deeper insight into query planning performance for Iceberg tables.
The query profile now displays scan metrics from Iceberg’s planFiles() API, including total planning time, counts of data/delete files and manifests, and the number of skipped files.
Metrics are displayed on a per-table basis. If a query scans multiple Iceberg tables, a separate metrics section will appear in the profile for each one.
For more information, see IMPALA-13628
Delete orphan files for Iceberg tables
You can now use the following syntax to remove orphan files for Iceberg tables:
-- Remove orphan files older than '2022-01-04 10:00:00'.
ALTER TABLE ice_tbl EXECUTE remove_orphan_files('2022-01-04 10:00:00');
-- Remove orphan files older than 5 days from now.
ALTER TABLE ice_tbl EXECUTE remove_orphan_files(now() - interval 5 days);
This feature removes all files from a table’s data directory that are not linked from metadata files and that are older than the value of older_than parameter. Deleting orphan files from time to time is recommended to keep size of a table’s data directory under control. For more information, see IMPALA-14492
Allow forced predicate pushdown to Iceberg
Since IMPALA-11591, Impala has optimized query planning by avoiding predicate pushdown to Iceberg unless it is strictly necessary. While this default behavior makes planning faster, it can miss opportunities to prune files early based on Iceberg’s file-level statistics.
A new table property, impala.iceberg.push_down_hint is introduced, which allows you to force predicate pushdown for specific columns. The property accepts a comma-separated list of column names, for example, 'col_a, col_b'.
If a query contains a predicate on any column listed in this property, Impala will push that predicate down to Iceberg for evaluation during the planning phase. For more information, see IMPALA-14123
UPDATE operations now skip rows that already have the desired value
The UPDATE statement for Iceberg and Kudu tables is optimized to reduce unnecessary writes.
Previously, an UPDATE operation would modify all rows matching the WHERE clause, even if those rows already contained the new value. For Iceberg tables, this resulted in writing unnecessary new data and delete records.
With this enhancement, Impala automatically adds an extra predicate to the UPDATE statement to exclude rows that already match the target value. For more information, see IMPALA-12588.
For more information about the Known issues, Fixed issues, and Behavioral changes, see the Cloudera Data Warehouse Release Notes.
Cloudera Observability
The October release of Cloudera Observability introduces the following changes:
Memory resource efficiency reporting for Spark
Memory resource efficiency analysis for Spark jobs is available on Spark engine version 3.3.0 and higher. For more information, see the Enabling Spark configuration for memory efficiency analysis documentation.
Understanding Cloudera Observability Financial Governance feature
With the Cloudera Observability Financial Governance feature, you can get a detailed summary report of the costs and resource usage for the environment. For more information, see the Displaying your costs associated with an environment documentation.
For more information about the Known issues and Fixed issues, see the Cloudera Observability Release Notes.
