February 7, 2023

This release of the Cloudera Data Warehouse (CDW) service on CDP Public Cloud introduces these changes.

AWS Elastic Kubernetes Service (EKS) version upgrade

The CDW application uses Kubernetes (K8S) clusters to deploy and manage Hive and Impala in the cloud. Kubernetes versions are updated every 3 months on average. When the version is updated, minor versions are deprecated.

To avoid compatibility issues between CDW and AWS resources, you must upgrade the version of Kubernetes that supports your existing CDW clusters to version 1.21.

AWS environments you activate using this release of Cloudera Data Warehouse will use version 1.22.

If a MIGRATE icon appears in the upper right corner of the environment tile, your AWS environment has not been migrated from Helm 2 to Helm 3. Update the Helm package manager for your environment before you attempt to upgrade its Kubernetes version.

For information about upgrading the AWS EKS Kubernetes version, see Upgrade CDW for EKS upgrade.

Support for Amazon Elastic Kubernetes Service 1.22 cluster

This release 1.6.1-b258 (released Feb 7, 2023) automatically uses and provisions Amazon Elastic Kubernetes Service (EKS) 1.22 when you activate an environment from CDW. In this release, upgrading a cluster to EKS 1.22 is not supported.

Changes to the managed policy ARN, standard IAM permissions, and restricted permissions policy

In this release, as an AWS environment user, you must update the managed policy ARN to handle Kubernetes CSI drivers for EBS and EFS. You must also update your restricted permissions policy if you use it.

Make the following changes:
  • Restricted policy changes for updating a managed policy ARN

    Add "arn:aws:iam::<AWS_ACCOUNT>:policy/<noderole-inline-policy>" as shown in "Attaching a managed policy ARN".

  • Restricted permissions policy

    Add "elasticfilesystem:PutFileSystemPolicy", to the ResourceTag object and move the "elasticfilesystem:CreateFileSystem", from the CloudFormation object to the RequestTag object, as shown in AWS restricted permissions policy.

Apache Iceberg GA in CDW

This release introduces the general availability of ACID transactions with Iceberg v2 tables from Hive in CDW Runtime 2023.0.13.0-20 (released 2023-2-7)). CDW Runtime 2022.0.12.0-90 (released 2022-12-13), introduced the general availability of ACID transactions with Iceberg V2 tables from Impala. You can run Apache Iceberg ACID transactions within some of the key data services in the Cloudera Data Platform (CDP) public cloud (AWS and Azure), including Cloudera Data Warehouse. From Hive or Impala, you use Apache Iceberg features in CDW, which include time travel, create table as select, and schema and partition evolution.

To access these features, create a new Virtual Warehouse or upgrade an existing one.

Support for Iceberg tables in Avro (Preview)

In this release, you can read Iceberg tables in Avro from Impala. There is a related known issue with using the DECIMAL data type in Avro this release.

Reading Iceberg tables in Avro format from Impala is available as a technical preview. Cloudera recommends that you use this feature in test and development environments. It is not recommended for production deployments.

Enhanced Iceberg support for materialized views

In this release, you can create a materialized view of an Iceberg V1 or V2 table based on an existing Hive table or an Iceberg table. Automatic rewriting of the materialized view occurs under certain conditions.

Iceberg load data inpath feature

From Impala, you can now load data into an Iceberg table using the load data inpath feature.

Querying Data Hub Kudu tables from an Impala Virtual Warehouse using Kudu

After configuring an Impala Virtual Warehouse to connect to Kudu, you can create Kudu tables using Impala clients, such as Hue, the Impala shell, or JDBC/ODBC. You can also ingest data using Spark/NiFi and query using Impala.

Flexible allow lists for Kubernetes cluster and load balancer

In previous releases, there was a single allow list for IP Classless Inter-Domain Routing (CIDRs) from which the Kubernetes cluster or the load balancer should accept incoming connections. In this release, you configure two separate allow lists for accepting incoming connections:
  • Enable IP CIDR for Kubernetes cluster
  • Enable IP CIDRs for the load balancer

Impala enhancements

This release of Cloudera Data Warehouse includes the following new Impala features:

  • Binary support: Impala now supports BINARY columns for all table formats except Kudu. See the BINARY support topic for more information on using this arbitrary-length byte array data type in CREATE TABLE and SELECT statements.
  • ALTER VIEW support: Before this release, altering only the VIEW definition, VIEW name, and owner was supported. Impala now supports altering the table properties of a VIEW by using the set tblproperties clause.
  • Push down date literals to Kudu scanner: Impala now allows creating and pushing down Kudu predicates from the DATE type.
    Example:
    select * from functional_kudu.date_tbl where date_col = DATE "1970-01-01";
                
                PLAN-ROOT SINK
                |
                00:SCAN KUDU [functional_kudu.date_tbl]
                kudu predicates: date_col = DATE '1970-01-01'
                row-size=12B cardinality=1
                ---- DISTRIBUTEDPLAN
                PLAN-ROOT SINK
                |
                01:EXCHANGE [UNPARTITIONED]
                |
                00:SCAN KUDU [functional_kudu.date_tbl]
                kudu predicates: date_col = DATE '1970-01-01'
                row-size=12B cardinality=1
  • Fix untracked memory in KRPC: Improved memory estimation of queries by accounting for untracked memory in KrpcDataStreamSender.
  • Redhat UBI8 images: To address multiple CVEs, Impala images are built using UBI8 base image.