May 5, 2023

This release of the Cloudera Data Warehouse (CDW) service on CDP Public Cloud introduces these changes.

Cloudera Data Warehouse Public Cloud 1.6.3-b319 changes, described in more detail below:

AWS Kubernetes version 1.24 support
Selective support for AWS Kubernetes version 1.22 upgrade
Changes to the standard required IAM permissions and AWS restricted policy
Synchronized metadata across Impala Virtual Warehouses
Upgrade and rebuild functionality
New requirements in Azure to provide managed identity
Ranger authorization (RAZ) in CDW
Accessing buckets in a RAZ environment
Configuring token-based authentication in CDW
Impala coordinator shutdown in Unified Analytics
Anti join support in Unified Analytics
Debugging enhancement: HiveServer OOM dump
Kubernetes dashboard (Preview)
Workload Aware Auto-Scaling (Preview)

Cloudera Data Warehouse Public Cloud Runtime 2023.0.14.0-155 changes:

Iceberg manifest caching
Iceberg Execute Rollback feature from Impala
Incremental rebuild of Iceberg materialized views
Query hint for Impala table cardinality
Improvement in compaction and Impala metadata refresh
Optimize Refresh/Invalidate event processing
Support for custom hash schema for Kudu range tables
New shell option "hs2_fp_format"
Support for complex types
Ability to view Impala query details in Hue
Ability to access S3 buckets from Hue with RAZ is GA
Ability to access ADLS Gen2 containers from Hue with RAZ is GA
The “enable_queries_list” configuration has been removed from Hue jobbrowser safety valve section

AWS Kubernetes version 1.24 support🔗

New CDW clusters using AWS environments you activate in this release 1.6.3-b319 (released May 5, 2023) of Cloudera Data Warehouse will use Amazon Kubernetes (EKS) version 1.24.

Selective support for AWS Kubernetes version 1.22 upgrade🔗

If you activated your pre-existing AWS environment in CDW version 1.4.2-b118 (released Aug 4, 2022) or later, you can upgrade to EKS 1.22 in this release 1.6.3-b319 (released May 5, 2023). AWS environments activated in earlier versions are not supported for upgrade to EKS 1.22 due to incompatibility of some components with EKS 1.22. To determine the activation version of your pre-existing environment, in the Data Warehouse service, expand Environments. In Environments, search for and locate the environment that you want to view. Click Edit. In Environment Details, you see the CDW version.

Changes to the standard required IAM permissions and AWS restricted policy🔗

In this release, as an AWS environment user, you must make the following IAM policy updates:

AWS restricted policy
In "Sid": "gocode", add "iam:ListAttachedRolePolicies", only if you have a Ranger Authorization (RAZ) environment.
Minimum set of IAM permissions required for reduced permissions mode
Add "autoscaling:UpdateAutoScalingGroup".
Add "iam:DeleteRolePolicy".
Add "iam:ListAttachedRolePolicies".

Synchronized metadata across Impala Virtual Warehouses🔗

Using Impala Virtual Warehouses that share a Database Catalog is easier in this CDW release. In past releases, after making changes to data and then refreshing tables or invalidating metadata from your Virtual Warehouse, only the catalog metadata and coordinator metadata for that particular Virtual Warehouse were affected. You had to rerun the commands from each Virtual Warehouse to synchronize metadata across multiple Virtual Warehouses that share a Database Catalog.

This release introduces an enhancement that raises events in the Hive metastore. Catalog daemons process events synchronously across all Virtual Warehouses that share metadata. Metadata is refreshed/invalidated in parallel across all your Virtual Warehouses. You need to run the commands only once in any one of your Virtual Warehouse.

To get this feature, you must upgrade your Virtual Warehouse to this release 1.6.3-b319 that has runtime 2023.0.14.0-155 (released May 5, 2023). You must also set the Impala catalogd flagfile key enable_reload_events to true. Newly created Virtual Warehouses use Impala version 2023.0.14.0-155, which has this feature enabled by default. For more information, including how to disable this feature, see Disabling metadata synchronization.

This enhancement does not synchronize metadata when you refresh tables or invalidate metadata from a Data Hub cluster.

Upgrade and rebuild functionality🔗

The generic Database Catalog and Virtual Warehouse version upgrade process has changed. When you click the Upgrade button to upgrade your Virtual Warehouse or Database Catalog version, the upgrade that occurs includes the rebuild functionality, which cleans up and redeploys your workload. For more information, Rebuilding a Database Catalog and Rebuilding a Virtual Warehouse.

New requirements in Azure to provide managed identity🔗

During the Azure environment activation and private AKS activation, you must provide the resource ID of a user-assigned managed identity to create the Azure Kubernetes cluster resource.

Ranger authorization (RAZ) in CDW🔗

In this release, if your Data Lake enables Ranger Authorization (RAZ), CDW will be RAZ-enabled to provide authentication in Hue, mainly. If you have an AWS environment that predates the option to enable RAZ for Cloudera Data Warehouse, you might want to manually enable RAZ in CDW.

Accessing buckets in a RAZ environment🔗

In an environment that enables Ranger Authorization (RAZ), you must configure permissions to access an S3 bucket. "Accessing buckets in a RAZ environment" describes how to configure the permissions depending on the AWS account that owns the bucket.

Configuring token-based authentication in CDW🔗

Using a JSON web token (JWT), your Virtual Warehouse client user can sign on to your Virtual Warehouse for a period of time instead of entering single-sign on (SSO) credentials every time your user wants to run a query. You use Impyla to access an Impala Virtual Warehouse and a JDBC connection to access a Hive Virtual Warehouse. For more information, see "Configuring token-based authentication".

Impala coordinator shutdown in Unified Analytics🔗

Cloudera Data Warehouse has supported Impala coordinator shutdown for some time, but not in Unified Analytics until this release. When you create an Impala Virtual Warehouse, and enable Unified Analytics, you can configure Impala coordinator shutdown in Unified Analytics.

Anti join support in Unified Analytics🔗

In this release, Unified Analytics enables an optimization for converting a join having a null filter to perform a SQL anti join.

Debugging enhancement: HiveServer OOM dump🔗

In this release, when a hive-server pod out-of-memory (OOM) event occurs, a corresponding dump will be available in the HiveServer dumps directory:

<external-bucket>/clusters/<env-id>/<dbc-id>/warehouse/debug-artifacts/hive/<compute-id>/hive-dumps/<node-name>/heapdump/dump.hprof

For example, the full path to the dump file looks something like this:

sharecdwdev-bucket-7cv9-dwx-external/clusters/env-9d7cv9/warehouse-1675918732-q9m2/warehouse/debug-artifacts/hive/compute-1675918927-q8hs/hive-dumps/ip-192-168-223-136.us-west-2.compute.internal/heapdump/hiveserver2-oom-dump.hprof

In the example above, ip-192-168-223-136.us-west-2.compute.internal is the name of the HiveServer node on which the pod was running.

Kubernetes dashboard (Preview)🔗

A technical preview of the K8S dashboard is available in this release. You can view the state of resources, such as CPU and memory usage, see the status of pods, and download logs. The dashboard can provide insights into the performance and health of a CDW cluster.

Workload Aware Auto-Scaling (Preview)🔗

Workload Aware Auto-Scaling (WAAS) is available as a technical preview. WAAS allocates Impala Virtual Warehouse resources based on the workload that is running. You choose an executor group set, analogous to a range of nodes instead of the fixed size of the previous auto-scaling implementation. Configuring WAAS requires an entitlement. Contact your account team if you are interested in previewing the feature.

Iceberg manifest caching 🔗

Apache Iceberg provides a mechanism to cache the contents of Iceberg manifest files in memory. The manifest caching feature helps to reduce repeated reads of small Iceberg manifest files by Impala Coordinators and Catalogd. You configure manifest caching in CDW. You can enable or disable caching and set a few other caching properties.

Iceberg Execute Rollback feature from Impala🔗

In the event of a problem with your table, you can reset a table to a good state as long as the snapshot of the good table is available. You can roll back the table data based on a snapshot id or a timestamp. The Describe History feature output includes information that helps you use snapshot-related features.

Incremental rebuild of Iceberg materialized views🔗

In this release, the materialized view is automatically updated under certain conditions. After inserting data into a source table of a materialized view, query rewriting is automatically enabled. An incremental rebuild, which updates only the changed part of the view, occurs. Incremental rebuilds tend to update the view much faster than full rebuilds.

Query hint for Impala table cardinality🔗

This release supports a new TABLE_NUM_ROWS query hint to specify a table cardinality for cases where statistics are missing or invalid.

Improvement in compaction and Impala metadata refresh🔗

In this release, handling of the COMMIT_COMPACTION_EVENT has been improved. During compaction, HMS raises events for ACID tables. Impala metadata is refreshed automatically.

Optimize Refresh/Invalidate event processing🔗

Before this release, some metadata consistency issues led to query failures. These failures happen because the metadata updates from multiple coordinators cannot differentiate between self-generated events and those that are generated by a different coordinator. This issue is resolved in this release by adding a coordinator flag to each event, and when processing these events, the coordinator flag is checked to decide whether to ignore the event or consider it.

Support for custom hash schema for Kudu range tables🔗

Impala now includes CREATE TABLE and ALTER TABLE syntax to allow a Kudu custom hash schema. HASH syntax within a partition is similar to the table-level syntax except that HASH clauses must follow the PARTITION clause, and commas are not allowed within a partition. You can use the SHOW HASH SCHEMA statement to view the hash schema information for each partition.

CREATE TABLE t1 (id int, c2 int, PRIMARY KEY(id, c2))
       PARTITION BY HASH(id) PARTITIONS 3 HASH(c2) PARTITIONS 4
       RANGE (c2)
       (
       PARTITION 0 <= VALUES < 10
       PARTITION 10 <= VALUES < 20
       HASH(id) PARTITIONS 2 HASH(c2) PARTITIONS 3
       PARTITION 20 <= VALUES < 30
       )
       STORED AS KUDU;
       ALTER TABLE t1 ADD RANGE PARTITION 30 <= VALUES < 40
       HASH(id) PARTITIONS 3 HASH(c2) PARTITIONS 4;

New shell option "hs2_fp_format"🔗

This new option sets the printing format specification for floating point values when using HS2 protocol. The default behavior makes the values handled by Python's str() built-in method. Use '16G' to match Beeswax protocol's floating-point output format.

Support for complex types🔗

A SELECT * statement did not expand to complex types to be compatible with earlier versions of Impala that did not support complex types in the result set. In the older versions of Impala, queries using SELECT * skip complex types by default and only expanded to primitive types even when the table contained complex-typed columns. This release adds a new query option EXPAND_COMPLEX_TYPES to include complex types in the SELECT * list.

Ability to view Impala query details in Hue🔗

You can now view Impala query details, query plan, execution summary, and query metrics on the new Impala Queries tab on the Job Browser page using Hue in CDW, and use this information to tune and optimize your queries. You can also compare two queries and view all query details side-by-side. For more information, see Viewing Impala query details.

Ability to access S3 buckets from Hue with RAZ is GA🔗

Granting fine-grained access to the per-user home directories in Amazon S3 and accessing them from the S3 File Browser in Hue using RAZ is GA. See Accessing S3 bucket from Hue in CDW with RAZ.

Ability to access ADLS Gen2 containers from Hue with RAZ is GA🔗

Granting fine-grained access to the per-user home directories on ADLS Gen2 containers and accessing them from the ABFS File Browser in Hue using RAZ is GA. See Accessing ADLS Gen2 containers from Hue in CDW with RAZ.

The “enable_queries_list” configuration has been removed from Hue jobbrowser safety valve section🔗

The enable_queries_list configuration in the CDW Hue safety valve was used to display or hide the Queries tab on the Job Browser page. This configuration has been removed. The Queries tab is displayed by default based on the type of Virtual Warehouse, whether it is Hive or Impala. You can override the query_store configuration and hide the Queries tab. For more information, see Hue configurations in Cloudera Data Warehouse.