Behavior changes

This release of the Cloudera Data Warehouse service on Cloudera on cloud has the following behavior changes:

Summary: Retaining container image metadata while copying Cloudera images to custom ECR repository

Due to security considerations, Cloudera images are sensitive and require their hash values to remain unchanged when transferring images between repositories. Ensure that images are copied while retaining the image manifests and hash values (SHA). If the image SHA in the custom ECR repository differs from the SHA in the Cloudera hosted repository, you may encounter issues while activating the Cloudera Data Warehouse clusters.

Before this release: You can copy images in any preferred way and you do not notice any issues while activating Cloudera Data Warehouse environments.

After this release: Ensure that the image SHA is retained when copying container images from the Cloudera hosted repository to your custom ECR repository. You may use any third-party tool, such as 'skopeo' to copy images between repositories while preserving the image metadata. For more information, see Copying images to custom ECR repository.

Summary: UI changes in the Cloudera Data Warehouse left navigation menu

Before this release: When you log in to the Cloudera Data Warehouse service, you can click the Environments, Database Catalogs, or Virtual Warehouses options from the left navigation menu to open the corresponding pages, and then access the required resources.

After this release: The Cloudera Data Warehouse UI has undergone changes and you can no longer navigate to the Environments, Database Catalogs, and Virtual Warehouses pages from the left navigation menu.

You can only access the required Environments, Database Catalogs, and Virtual Warehouses from the respective tabs in the Overview page.

Cloudera Data Warehouse Overview page

Summary: Ability to select a compute instance type during environment activation

Before this release: You could select a compute instance type only when using the CDP CLI to activate an environment in Cloudera Data Warehouse. The option to select the instance type through the UI was removed.

After this release: Starting with this release, you can no longer select a compute instance type when you use the CDP CLI to activate an environment in Cloudera Data Warehouse.

Summary: View historical data in the Impala Autoscaling dashboard

Before this release: In the Impala Autoscaling Dashboard, you could view autoscaler metrics data for the past one hour and use a time window slider to zoom into a specific time period within the recent one hour window.

After this release: The Impala Autoscaling Dashboard is enhanced to enable you to view historical data along with live data. You can now choose the Historic Data option and specify the start and end timestamps for which you want to view autoscaler metrics data. Note that this feature is currently available only for AWS environments.

Summary: Specify a file size threshold limit for Iceberg data compaction

Before this release: The Impala OPTIMIZE TABLE <table_name> statement rewrites all files in the table, regardless of size or type, even when there are no small or delete files.

After this release: The Impala OPTIMIZE TABLE <table_name> is enhanced to include a FILE_SIZE_THRESHOLD_MB option that enables you to specify the maximum size of files (in MB) that should be considered for compaction.

Summary: Removal of "docker" image registry type

Before this release: While activating an environment, you can choose the following image repositories — ecr, acr, or docker in the Registry Type option.

After this release: The "docker" custom image registry type is no longer supported in Cloudera Data Warehouse and the option to choose the "docker" registry type during environment activation is removed. You can either choose a custom ACR or ECR image repository.

Summary: Consistent protocol version values in workload management tables

Before this release: The sys.impala_query_log table stored a full protocol name such as "HIVE_CLI_SERVICE_PROTOCOL_V6" in the hiveserver2_protocol_version column. In contrast, the sys.impala_query_live table and query profiles used a shorter value such as "V6".

After this release: The hiveserver2_protocol_version column in the sys.impala_query_log table now uses the same short string (for example, "V6") as the sys.impala_query_live table and query profiles.

Restore previous behavior:

This SQL statement replicates the behavior before this release where sys.impala_query_log stored a value of "HIVE_CLI_SERVICE_PROTOCOL_V6":

SELECT CASE hiveserver2_protocol_version WHEN 'V6' THEN 'HIVE_CLI_SERVICE_PROTOCOL_V6'
ELSE hiveserver2_protocol_version END as hiveserver2_protocol_version FROM sys.impala_query_log

Summary: Disabling join disjunctive predicate pushdown

Before this release: With hive.optimize.join.disjunctive.transitive.predicates.pushdown enabled by default, queries with disjunctive predicates could cause HiveServer2 to crash or run out of memory during compilation.

After this release: The hive.optimize.join.disjunctive.transitive.predicates.pushdown setting is now disabled by default, enhancing HiveServer2 stability and preventing crashes and out-of-memory errors. In some rare cases, queries with joins and unions become slightly less efficient but the difference should not be noticeable by the end-users.

Apache Jira: HIVE-28310

Summary: Hive CBO fallback strategy configuration

Before this release: The hive.cbo.fallback.strategy property was set to CONSERVATIVE by default. In case of an error during the cost-based optimizer phase, Hive would fallback to the legacy optimizer, potentially reducing optimization efficiency and masking serious or unrecoverable errors.

After this release: The default value for hive.cbo.fallback.strategy is now set to NEVER. Hive no longer falls back to the legacy optimizer and cost-based optimizer errors are fatal. Hidden compilation errors will now show up immediately and additional actions are required to compile and execute the query successfully.

Apache Jira: HIVE-27831