Cloudera on Cloud: March 2026 Release Summary
The Release Summary of Cloudera on cloud summarizes major features introduced in Management Console, Data Hub, and data services.
Cloudera AI
Cloudera AI 2.0.55-b196 introduces the following changes:
Cloudera AI Control Plane
- Fixed a container runtime regression (runc v1.4.0) that caused health probe failures on Azure. The Buildkit DaemonSet probe mechanism was updated to ensure stable image builds and to guarantee that the MLX secret is preserved in the monitoring namespace during workspace upgrades and restorations. (DSE-52078)
For more information about the Known issues, Fixed issues and Behavioral changes, see the Cloudera AI Release Notes.
Cloudera Data Catalog
Cloudera Data Catalog 3.2.0 introduces the following changes:
Data Sharing in Cloudera Data Catalog
-
Data Sharing provides a secure, self-service way to grant read-only access to Iceberg tables for external users through the Cloudera Iceberg REST Catalog. This eliminates the need for data duplication, thereby enhancing security, reducing storage costs, and avoiding complex ETL processes across different cloud environments when using these tables in Iceberg REST API compatible third-party compute engines.
By integrating Data Sharing with Cloudera Data Catalog, you get the following advantages:
Centralized Management
Data Providers can use the intuitive Cloudera Data Catalog user interface to effortlessly discover assets, create logical Data Shares, and manage external user access and secure external user credentials in one place.Enhanced Discoverability
You can enrich Data Shares with custom keywords and apply Atlas classifications to shared assets to improve searchability and context.Integrated Auditing
The interface provides quick, centralized access to audit reports, allowing you to monitor Data Share updates and access events efficiently.Important
This feature is available as technical preview and is under entitlement. To obtain the required entitlement, contact your Cloudera Account Representative.
For more information, see:
Support for TLS 1.3
Cloudera Data Catalog now enforces TLS 1.3 for encryption across all services to enhance security and ensure compliance with modern security best practices. All outbound traffic, including communication with the control plane service and cloud storage, uses secure HTTPS connections.
For more information about the Known issues, Fixed issues and Behavioral changes, see the Cloudera Data Catalog Release Notes.
Cloudera Data Engineering
Cloudera Data Engineering 1.25.2-H1is a hotfix release that does not include new features, but a list of fixes. For more information, see the Cloudera Data Engineering Release Notes.
Cloudera Data Engineering 1.25.2 introduces the following changes:
Job-level Instance Type Override [Technical Preview]
Spot-instance settings are the default settings at Virtual Cluster (VC) level, however, to enhance flexibility, you have the option to overwrite the defaults on a per-job basis. The Job-level Instance Type Override feature is available in the Unified Jobs UI, but not in the Legacy Jobs UI.
- You can access the Unified Jobs UI by selecting Jobs from the left navigation menu of the Cloudera Data Engineering UI.
- You can access the Legacy Jobs UI from the Virtual Cluster Details option by clicking the View Jobs button.
Note
The Job Level Instance Type Override feature is applicable when the Virtual Cluster (VC) level instance type is a Spot instance and applies to both Spark and Airflow jobs.
For VC instance types where by default the Driver runs on on-demand and Executors on spot, this feature is applicable only to Spark jobs.
If the VC-level instance type is On-Demand, the Job Level Instance Type Override feature is not applicable. In such a case, attempting to override the job-level compute configuration is not supported and will result in an error.
If you disable the Job-level Instance Type Override feature, the compute-configuration of the job reverts to the VC-level setting, and becomes read-only at the job level.
If you enable the the Job-level Instance Type Override feature, you can perform the following actions:
- For Spark, you can override the default virtual cluster settings to choose specific node types for Spark drivers and executors.
- For Airflow, you can override the default virtual cluster settings to choose specific node types for Airflow workers.
For more information, see:
- Viewing and managing virtual cluster details
- Creating jobs in Cloudera Data Engineering
- Overriding the job-level instance type using the CLI
Virtual cluster-level suspend and resume available for GA
The Virtual cluster-level suspend and resume feature, previously available as a technical preview, has now reached General Availability (GA).
Enhancements introduced for virtual cluster-level suspend and resume:
- The functionality to suspend and resume Cloudera Data Engineering Virtual Clusters (VCs) is now supported directly within the Cloudera Data Engineering UI. This capability was previously accessible only through the Cloudera Data Engineering API or the CDP CLI.
- From Cloudera Data Engineering 1.25.2, you have the option to simultaneously suspend multiple Cloudera Data Engineering VCs, eliminating the previous restriction that required VCs to be suspended sequentially.
For more information, see:
- Overview of suspending and resuming Cloudera Data Engineering virtual clusters
- Suspending and resuming Cloudera Data Engineering virtual clusters using the Cloudera Data Engineering UI
Switching from Azure AD (AAD) Pod Identity to Workload identity
In Cloudera Data Engineering 1.25.2, Workload Identity replaces the Azure AD Pod Identity (aad-pod-identity) component used for some of the workloads in Cloudera Data Engineering to pull logger credentials.
The Workload Identity component is more secure, provides faster startup times with better scaling and enables you to use more granular permissions.
Workload Identity requires users to provision two new user-assigned Managed Identities for each new Cloudera Data Engineering service.
Key prerequisites include updating environment credentials with a custom role to manage Federated Identity Credentials (FIC) and ensuring all new Managed Identities have the Storage Blob Data Contributor role for the logs container.
In-place upgrade operations require patching the existing Cloudera Data Engineering service to update Managed Identities through the Cloudera Data Engineering UI or the patchCluster API.
Cloudera Data Engineering service-level backup and restore operations require overriding existing Managed Identities with new ones through CLI options.
For more information, see Switching from Azure AD Pod Identity (aad-pod-identity) to Workload Identity in Azure clusters.
Graviton-based database instances are now the default for Cloudera Data Engineering services on AWS
Starting with Cloudera Data Engineering 1.25.2, the database instances for Cloudera Data Engineering services on AWS are provisioned on Graviton instances by default to save on cloud costs. If a Graviton instance is not available, the service will fall back to an x86 instance. To explicitly use an x86 instance when creating a Cloudera Data Engineering service, you must set the following configuration using the Cloudera Data Engineering API: config.database.disable_arm64=true
When upgrading an existing service to Cloudera Data Engineering 1.25.2, the associated database instances are automatically upgraded to Graviton.
Python Environment support introduced on the Cloudera Data Engineering UI for External IDE connectivity through Spark Connect-based sessions [Technical Preview]
On the Cloudera Data Engineering UI, you now have the option to select the Python Environment for External IDE connectivity through Spark Connect-based sessions.
For more information, see Configuring external IDE Spark Connect sessions.
Cloudera Data Engineering support for Atlas Lineage for Spark Iceberg tables
You can now generate Atlas Lineage for Spark Iceberg tables interactively using Cloudera Data Engineering sessions. This brings lineage tracking to your exploratory and interactive Spark workloads, allowing you to capture table creation and insertion events in Apache Atlas, just as you do with Cloudera Data Engineering jobs.
For a list of supported Spark SQL creation patterns and current limitations, see Cloudera Data Engineering support for Atlas Lineage for Spark Iceberg tables.
Enhanced scaling range for Cloudera Data Engineering services
The maximum limit of the Autoscaling Range parameter has been extended, allowing a Cloudera Data Engineering service to scale up to 250 nodes.
For more information, see Cloudera Data Engineering auto-scaling.
Kubernetes version upgrade to 1.33
The Kubernetes version that Cloudera Data Engineering uses is upgraded to Kubernetes 1.33.
For more information, see Compatibility for Cloudera Data Engineering and Runtime components.
For more information about the Known issues and Fixed issues, see the Cloudera Data Engineering Release Notes.
Cloudera Data Flow
Cloudera Data Flow 3.0.0-b508 introduces the following changes:
New features
Latest NiFi version
Flow Deployments and Test Sessions now support the latest Apache NiFi 1.28.1 and NiFi 2.6.0 releases.
Multiple flows on the same deployment
Deployments and flows have been decoupled in Cloudera Data Flow and now it supports adding multiple flows to the same NiFi deployment. This allows for optimizing cloud infrastructure costs and resources associated with deployments.
Provenance events in Flow Designer
Flow Designer now provides a tool to observe data provenance events in running test sessions. It provides a searchable historical record of every data object (FlowFile) as it moves through your flow. It can help you debug and audit issues with your flows by understanding the changes (events) that occurred to data as it was being processed by each processor.
Platform updates
New Kubernetes version support
Cloudera Data Flow now supports EKS/AKS 1.33.
Changes and improvements
Reduced Flow Designer CPU consumption
This release of Cloudera Data Flow introduces an improvement that reduces CPU consumption of Flow Designer.
New test session states allow for better control of cloud costs and test session lifecycle management
Test sessions now have states identical to those of deployments (active,suspended, terminated). Suspending a test session tears down associated cloud resources while preserving the assets associated with the draft for which the test session was initiated, making them available for future use. Terminating a test session removes all associated assets and related cloud resources.
Projects enforced on environment level
Association of resources to a project can now be enforced on the environment level. If this feature is enabled for an environment, users cannot create resources with ‘Unassigned’ status. You can switch this feature on while enabling an environment for Cloudera Data Flow.
Project-level usage tracking
With a new ‘Project’ tag added to Cloudera Data Flow usage events, it is now possible to run reports for Cloudera Data Flow usage with project-level granularity on the Cloudera Management Console.
NiFi End of Support (EoS) policy enforced during service upgrades
When running an environment upgrade with the ‘Preserve NiFi versions of deployments and test sessions’ selected to keep operability of your existing resources, the upgrade wizard checks if the NiFi runtime versions of those resources will be compatible with the upgraded Cloudera Data Flow service. If the result is negative, the upgrade is blocked until you either upgrade the offending resources to the minimum supported NiFi version, let the wizard upgrade them to the latest available version as part of the standard upgrade process, or delete them.
Updated ReadyFlows
- The RAG Query Pinecone ReadyFlow has been updated so the “Bedrock Model Name” no longer defaults to a specific model and instead instructs users to choose an active, non‑deprecated model.
- In the Airtable to S3/ADLS ReadyFlow the ‘Airtable API Key’ configuration parameter has been replaced with ‘Airtable Personal Access Token’.
Removed ReadyFlows
- ADLS to Chroma DB
- S3 to Chroma DB
- S3 to S3 Avro with S3 Notifications
- Slack to Chroma DB
For more information about the Known issues, Fixed issues and Behavioral changes, see the Cloudera Data Flow Release Notes.
Cloudera Data Hub
This release of the Cloudera Data Hub introduces the following changes:
Cloudera Runtime 7.3.2
Cloudera Runtime 7.3.2 is now available and can be used for registering an environment with a 7.3.2 Data Lake and creating Cloudera Data Hub clusters. For more information about the new Cloudera Runtime version, see Cloudera Runtime. If you need to upgrade your existing Cloudera environment, your upgrade path may be complex. To determine your upgrade path, refer to Upgrading to Cloudera Runtime 7.3.2 documentation.
Cloudera Data Warehouse
Cloudera Data Warehouse 1.12.1-b259 introduces the following changes:
What’s new in Cloudera Data Warehouse on cloud
Azure AKS 1.34 upgrade
Cloudera supports the Azure Kubernetes Service (AKS) version 1.34. In 1.12.1-b259 (released March 31, 2026), when you activate an Environment, Cloudera Data Warehouse automatically provisions AKS 1.34. To upgrade to AKS 1.34 from a lower version of Cloudera Data Warehouse, you must backup and restore Cloudera Data Warehouse.
Note
Using the Azure CLI or Azure portal to upgrade the AKS cluster is not supported and might result in cluster instability or downtime. For more information about upgrading, see Upgrading an Azure Kubernetes Service (AKS) cluster.
AWS EKS 1.34 upgrade
Cloudera supports the AWS Elastic Kubernetes Service (EKS) version 1.34. In 1.12.1-b259 (released March 31, 2026), when you activate an Environment, Cloudera Data Warehouse automatically provisions EKS 1.34. To upgrade to EKS 1.34 from a lower version of Cloudera Data Warehouse, you must backup and restore Cloudera Data Warehouse.
Note
Using the AWS tools to upgrade the EKS cluster is not supported and might result in cluster instability or downtime. For more information about upgrading, see Upgrading an Amazon Kubernetes Service (EKS) cluster.
Removal of Unified Analytics
In this release, the Unified Analytics framework, including the Impala Virtual Warehouse implementation, is fully removed from Cloudera Data Warehouse on cloud. All remaining Unified Analytics components, configuration paths, and UI flows are either cleaned up or migrated to the standard Impala virtual warehouse architecture. This change simplifies operations and ensures continued support for existing Impala workloads on the current platform.
What’s new in Cloudera Data Explorer (Hue) on Cloudera Data Warehouse on cloud
Product branding update
Starting with this release, the product component previously known as Hue has been renamed to Cloudera Data Explorer (Hue). This change reflects UI and an updated branding initiative and will be rolled out in phases.
As part of this release, you may notice:
- A new logo displayed in the UI
- The service name updated to Data Explorer in the UI
- The new product name reflected in documentation
Some UI references may still display the previous name as the branding update is completed incrementally in future releases.
There is no functional impact associated with this change. All existing configurations, workflows, and integrations continue to work as before.
Enhanced session security for Cloudera Data Explorer (Hue)
Data Explorer now includes security for the session ID (sessionid) cookie. This enhancement helps prevent unauthorized access results in data exposure, unauthorized query execution, and job submission across connected Data Explorer services.
For more information, see Securing sessions.
Facts supports in SQL AI Assistant
You can now define custom system instructions to guide the SQL AI Assistant in generating more accurate queries based on your specific business logic. This enhancement supports complex, cross-database workflows by allowing you to persist organizational context in the Assistant settings.
For more information, see Fact support for SQL query.
Data Explorer support for the boto3 SDK
Data Explorer now supports the boto3 SDK for accessing AWS S3. This update replaces the legacy connector framework to provide improved performance and compatibility with AWS services.
To ensure a smooth transition, the system automatically converts your existing configurations to the new connector system. This feature is enabled by default, but you can manually disable the feature flag if necessary.
For more information, see Enabling the S3 File Browser for Cloudera Data Explorer (Hue) in Cloudera Data Warehouse with RAZ and Enabling the S3 File Browser for Cloudera Data Explorer (Hue) in Cloudera Data Warehouse without RAZ.
What’s new in Hive on Cloudera Data Warehouse on cloud
Small file warnings in console
The MSCK and ANALYZE commands now display a warning in the console if the average file size for a table or partition is below the threshold. This helps you identify small files that might affect performance.
For more information, see Statistics generation and viewing commands in Cloudera Data Warehouse
Performance improvement for column changes
The ALTER CHANGE COLUMN command is now faster for tables that have many partitions. This change prevents the command from performing a separate Metastore service call to update column statistics for every partition, which previously caused long execution times and timeouts. For large partitioned tables, the execution time is reduced from hours to minutes.
Apache Jira: HIVE-28346
Hive Query History Service
The Hive query history service provides a scalable solution for storing and analyzing historical Hive query data. It captures detailed information about completed queries, such as runtime, accessed tables, errors, and metadata, and stores it in an efficient Iceberg table format. For more information see, Hive query history service
What’s new in Iceberg on Cloudera Data Warehouse on cloud
Table repair feature support for Iceberg tables
Impala introduces the repair_metadata() function for Iceberg tables. This function provides a self-service recovery path to recover Iceberg tables that are inaccessible due to missing data files after manual file deletions in the underlying storage. For more information, see Table repair feature.
Support for SHOW FILES IN table PARTITION for Iceberg
Impala now supports the SHOW FILES IN command with the PARTITION clause to list data files for specific partitions in Iceberg tables. This enhancement extends metadata capabilities by enabling inspection of partition-level physical data directly from Impala. For more information, see Describe table metadata feature.
Support for additional partition transform functions for Iceberg tables
Iceberg now supports additional partition transform functions such as BUCKET, TRUNCATE, IDENTITY, and VOID. These transformations extend partitioning capabilities by enabling hashing, value truncation, direct partitioning, and handling of null partitions. For more information, see Partition transform feature.
Support for partition columns in WHERE clause predicates
Hive Iceberg compaction now supports WHERE clause predicates on partition columns. This enhancement allows you to selectively compact data by filtering partition columns, improving efficiency and control over compaction operations. For more information, see Data compaction.
What’s new in Impala on Cloudera Data Warehouse on cloud
Caching intermediate query results
Impala now supports caching intermediate results to improve query performance and resource efficiency for repetitive workloads. By storing results at various locations within the SQL plan tree, the system can reuse computation for similar queries even when they are not identical, provided the underlying data and settings remain unchanged. For more information and instructions on enabling this feature, see Caching intermediate results.
User role management
You can now grant and revoke roles directly to and from individual users in Impala, providing more granular control over security management. This feature includes support for the GRANT ROLE, REVOKE ROLE, and SHOW ROLE GRANT USER statements, aligning Impala with Apache Hive’s role-related functionality.
For more information, see impala role, impala grant role, impala show roles and impala revoke role
Apache Jira: IMPALA-14085
Native geospatial query acceleration
Cloudera Data Warehouse 2025.0.21.0 introduces native implementations for specific geospatial functions to accelerate simple queries. This feature reduces processing overhead by avoiding transitions to the Java Virtual Machine and optimizing file-level filtering for Parquet and Iceberg tables. For more information, see Impala Geospatial query acceleration
OpenTelemetry integration for Impala
Cloudera Data Warehouse now provides OpenTelemetry (OTel) support to help you monitor query performance and troubleshoot issues. This new feature, collects and exports query telemetry data as OpenTelemetry traces to a central OpenTelemetry compatible collector. The integration is designed to have a minimal impact on performance because it uses data already being collected and handles the export in a separate process. For more information, see OpenTelemetry support for Impala
Apache Jira: IMPALA-13234
Filtering SHOW PARTITIONS output
You can now use the WHERE clause with the SHOW PARTITIONS statement to filter results based on partition column values. This enhancement helps you manage tables with a large number of partitions by narrowing down the output using comparison operators, IN lists, BETWEEN clauses, IS NULL predicates, and logical expressions. For more information, see the SHOW PARTITIONS statement.
Apache Jira: IMPALA-14065
Parallelizing JDBC External Table queries
You can now execute queries on JDBC tables in parallel to improve performance for joins and aggregations. Impala now estimates the number of rows in a JDBC table by running a COUNT query during query preparation. This estimation allows the planner to assign multiple scanner threads, introduce exchange nodes, and produce more efficient join orders. You can also use the --min_jdbc_scan_cardinality backend flag to set a lower bound for these estimates. For more information, see Parallelizing JDBC External Table queries
Recreating tables with statistics
You can use the WITH STATS clause in the SHOW CREATE TABLE statement to generate the SQL required to recreate a table along with its column statistics and partition metadata. See, SHOW CREATE TABLE WITH STATS statement.
Apache Jira: IMPALA-13066
Quoting reserved words in column names
You can now explicitly quote all column names projected in SQL queries generated for JDBC external table names. Column names are wrapped with quote characters based on the JDBC driver being used:
- Backticks (`) for Cloudera Runtime Hive, Impala, and MySQL
- Double quotes (“) for all other databases
This supports the use of case-sensitive or reserved column names. For more information, see Quoting reserved words in column names
Apache Jira: IMPALA-13066
New catalogd flag to disable HMS sync by default
You can now use the disable_hms_sync_by_default catalogd startup flag to set a global default for the impala.disableHmsSync property. This feature allows you to skip event processing for all databases and tables by default while opting in specific elements as needed.
For more information, see: Catalogd Daemon startup flag
Apache Jira: IMPALA-14131
Parallel metadata loading in local catalog mode
Previously, when a query accessed multiple unloaded tables in local catalog mode, Impala loaded the metadata for those tables one after another. This sequential process caused significant latency and performance regressions compared to the legacy catalog mode.
This issue is addressed by parallelizing the table loading process. The fix allows Impala to load and gather metadata for multiple tables simultaneously. You can control the maximum number of threads used for this process by using the new max_stmt_metadata_loader_threads flag, which defaults to 8 threads per query compilation. See, Catalog startup flag
Apache Jira: IMPALA-14447
Specifying compression levels for LZ4, ZLIB, and ZSTD
You can now specify compression levels for the LZ4, ZLIB, GZIP, and ZSTD codecs to achieve higher compression ratios. This includes support for high compression modes in LZ4 (levels 3–12) and negative compression levels for ZSTD. These levels are supported by using the compression_codec query option.
For more information, see compression_codec query option
Apache Jira: IMPALA-10630, IMPALA-14082
Configuring remote JCEKS keystores for Impala AI Functions
You can now specify a remote JCEKS keystore path by using the REMOTE_JCEKS_PATH environment variable. This allows the system to automatically copy remote keystores from S3 or Azure storage to the local filesystem on coordinator and executor pods, preventing initialization errors.
For more information, see Configuring remote JCEKS keystores for Impala AI Functions
Batch processing for reload events
Cloudera now supports batch processing of RELOAD events on the same table by using the BatchPartitionEvent logic. This enhancement allows you to load partitions in parallel and reduces duplicate reloads. By minimizing the number of times a table lock is acquired and reducing table version changes, this feature improves the performance of coordinators in local-catalog mode and reduces query planning retries.
Apache Jira: IMPALA-14082
Consolidated event processing for partition changes
Cloudera now supports the ALTER_PARTITIONS event type, which consolidates multiple partition changes into a single event. By processing one batch event instead of numerous individual ALTER_PARTITION events, the event processor can synchronize metadata more quickly and reduce the processing load on the CatalogD cache.
Apache Jira: IMPALA-13593
What’s new in Trino on Cloudera Data Warehouse on cloud
General Availability (GA) of Trino in Cloudera Data Warehouse
Trino is a distributed SQL query engine designed to efficiently query large datasets across one or more heterogeneous data sources. This integration enables users to leverage Trino’s powerful capabilities directly within Cloudera Data Warehouse.
The GA release of Trino in Cloudera Data Warehouse introduces several key capabilities:
- Trino Virtual Warehouses — Offers full support for creating and managing Trino Virtual Warehouses across both Amazon Web Services (AWS) and Microsoft Azure environments. This enables efficient querying across diverse, large datasets regardless of your cloud provider. For information about creating a Trino Virtual Warehouse, see Adding a new Virtual Warehouse.
note
Trino support for Microsoft Azure environments is in technical preview and not recommended for use in production deployments. Cloudera recommends that you try this feature in test and development environments. - Federation and Connectivity — Seamless connection and management of various remote data sources is possible through Trino Federation Connectors, including the new Teradata custom connector. A dedicated connector management UI and backend facilitates the creation and configuration of these connectors. For more information, see Trino Federation Connectors.
- Security and Governance — Governance is enforced by default through Apache Ranger using the
cm_trinoauthorization service. You can create or update Ranger policies for specific resources and assign permissions to Trino users, groups, or roles. When a user submits a query to Trino, the system verifies the defined policies to ensure that the user has the necessary permissions to run queries. For more information, see Ranger authorization for Trino Virtual Warehouses. - Performance Optimization — Built-in capabilities for auto-suspend and auto-scaling are supported. These configurations help optimize resource utilization and ensure the provisioning of a high-performance and scalable Trino Virtual Warehouse.
- Support for Teradata connector (Technical Preview) — Cloudera Data Warehouse now introduces support for a read-only Trino-Teradata connector. This feature is designed to facilitate
SELECToperations on Teradata sources, operating in ANSI Mode and optimizing performance by pushing down filters and aggregates. For more information, see Teradata connector. - Connection pooling for JDBC-based connectors — You can now configure connection pooling capabilities for JDBC-based Trino connectors, such as MySQL, PostgreSQL, MariaDB, Teradata, and Oracle. Connection pooling helps in better performance, resource utilization, and stability while querying different data sources using Trino. For more information, see Connection pooling for JDBC-based connectors.
- Backup and restore behavior for Trino — The backup and restore functionality in Cloudera Data Warehouse now includes updates for Trino. Trino Virtual Warehouses are included in environment backups and are restored along with the environment. However, Trino connector objects are not backed up or restored as part of the environment reactivation workflow.
After restoring an environment, it is necessary to manually recreate Trino connectors and attach them to the restored Trino Virtual Warehouses. For more information, see backup and restore Cloudera Data Warehouse.
For more information about the Known issues, Fixed issues and Behavioral changes, see the Cloudera Data Warehouse Release Notes.
Cloudera Management Console
This release of the Cloudera Management Console service introduces the following changes:
Cloudera Runtime 7.3.2
Cloudera Runtime 7.3.2 is now available and can be used for registering an environment with a 7.3.2 Data Lake and creating Cloudera Data Hub clusters. For more information about the new Cloudera Runtime version, see Cloudera Runtime. If you need to upgrade your existing Cloudera environment, your upgrade path may be complex. To determine your upgrade path, refer to Upgrading to Cloudera Runtime 7.3.2 documentation.
Cloudera Lakehouse Optimizer
In Cloudera on cloud 7.3.2 and higher versions, Cloudera Lakehouse Optimizer provides the following features in Lakehouse Optimizer UI:
- The Associations tab replaces the Tables tab and provides the Tables Associations and Namespace Associations subtabs.
- The Recent Tasks Activity tab lists a maximum of 100 recent tasks run by Cloudera Lakehouse Optimizer.
- You can perform the following actions on an Iceberg table:
- Dry Run Only — Generates the maintenance tasks for the table, and performs a dry run to ensure they run without failure. No action is performed on the table.
- Dry Run and Execute — Generates the maintenance tasks on the table, initiates a dry run, and provides the output. You can view the output and take appropriate action to modify the policy.
- Provide administrator, operator, or monitor role access for a user or a group at namespace level. Enable the Ranger service for Cloudera Lakehouse Optimizer, and then create the Ranger policies to provide the fine-grained access to a user or group.
For more information, see Lakehouse Optimizer.
Cloudera Operational Database
Cloudera Operational Database 1.59 introduces the following changes:
Cloudera Operational Database supports Multiple Availability Zones (Multi-AZ) on Google Cloud Platform (GCP)
Cloudera Operational Database ensures high availability and fault tolerance through the utilization of Multi-AZ deployments. A Multi-AZ deployment signifies that the compute infrastructure for HBase’s master and region servers is distributed across multiple Availability Zones (AZs). This configuration guarantees that in the event of an outage in a single availability zone, only a fraction of the Region Servers are impacted, allowing clients to automatically failover to the remaining servers situated in the available AZs.
Support for Multi-AZ deployments of Cloudera Operational Database is now available on Google Cloud Platform (GCP) environments too. However, Cloudera Operational Databases provisioned with the Micro Duty scale type do not accommodate Multi-AZ configurations.
For more information, see:
- Multi-AZ deployment on Cloudera Operational Database
- Deploying Cloudera In Multiple GCP Availability Zones
Cloudera Operational Database does not support Multiple Availability Zones (Multi-AZ) while using the Micro duty scale type
Cloudera Operational Databases provisioned with the Micro Duty scale type are no longer compatible with Multi-AZ configurations. Since these deployments are intended solely for testing and development environments, Multi-AZ support is not necessary. Therefore, the option to configure Multi-AZ networking for Micro scale operational database clusters has been removed, and Micro clusters are strictly limited to Single-AZ deployments, accurately reflecting their architectural capabilities.
Cloudera Replication Manager
This release of the Replication Manager service introduces the following new features:
Iceberg replication policies
You can use Iceberg replication policies in Cloudera Replication Manager to replicate Iceberg tables between Data Lakes through Data Hubs in Cloudera on cloud 7.3.2 or higher versions using AWS. The Data Lakes can be located in a single AWS region or across multiple regions.
In Cloudera on cloud using AWS, you must deploy a source Iceberg Replication Data Hub in the source Data Lake and a target Iceberg Replication Data Hub in the target Data Lake, and then create the Iceberg replication policy in the target Data Hub.
For more information, see Using Iceberg replication policies.
Ranger replication policies
You can create Ranger replication policies in Cloudera Replication Manager to migrate the Ranger policies and roles for HDFS, Hive, and HBase services. You can migrate these Ranger policies from Kerberos-enabled Cloudera Base on premises 7.3.2 or higher clusters using Cloudera Manager 7.13.2 to Cloudera on cloud 7.3.2 clusters. The Ranger replication policies can also migrate the Ranger audit logs in HDFS.
For more information, see Using Ranger replication policies.
