February 7, 2023
This release of the Cloudera Data Warehouse (CDW) service on CDP Public Cloud introduces these changes.
AWS Elastic Kubernetes Service (EKS) version upgrade
The CDW application uses Kubernetes (K8S) clusters to deploy and manage Hive and Impala in the cloud. Kubernetes versions are updated every 3 months on average. When the version is updated, minor versions are deprecated.
To avoid compatibility issues between CDW and AWS resources, you must upgrade the version of Kubernetes that supports your existing CDW clusters to version 1.21.
AWS environments you activate using this release of Cloudera Data Warehouse will use version 1.22.
If a MIGRATE icon appears in the upper right corner of the environment tile, your AWS environment has not been migrated from Helm 2 to Helm 3. Update the Helm package manager for your environment before you attempt to upgrade its Kubernetes version.
For information about upgrading the AWS EKS Kubernetes version, see Upgrade CDW for EKS upgrade.
Support for Amazon Elastic Kubernetes Service 1.22 cluster
This release 1.6.1-b258 (released Feb 7, 2023) automatically uses and provisions Amazon Elastic Kubernetes Service (EKS) 1.22 when you activate an environment from CDW. In this release, upgrading a cluster to EKS 1.22 is not supported.
Changes to the managed policy ARN, standard IAM permissions, and restricted permissions policy
In this release, as an AWS environment user, you must update the managed policy ARN to handle Kubernetes CSI drivers for EBS and EFS. You must also update your standard IAM permissions, and the restricted permissions policy if you use it.
- Restricted policy changes for updating a managed policy ARN
"arn:aws:iam::<AWS_ACCOUNT>:policy/<noderole-inline-policy>"as shown in "Attaching a managed policy ARN".
- Standard JSON IAM permissions policy template
Add the following line:
“elasticfilesystem:PutFileSystemPolicy”,as shown in "Standard JSON IAM permissions policy template".
- Restricted permissions policy
"elasticfilesystem:PutFileSystemPolicy",to the ResourceTag object and move the
"elasticfilesystem:CreateFileSystem",from the CloudFormation object to the RequestTag object, as shown in AWS restricted permissions policy.
Synchronized metadata across Impala Virtual Warehouses
Using Impala Virtual Warehouses that share a Database Catalog is easier in this CDW release. In past releases, after making changes to data and then refreshing tables or invalidating metadata from your Virtual Warehouse, only the catalog metadata and coordinator metadata for that particular Virtual Warehouse were affected. You had to rerun the commands from each Virtual Warehouse to synchronize metadata across multiple Virtual Warehouses that share a Database Catalog.
This release introduces an enhancement that raises events in the Hive metastore. Catalog daemons process events synchronously across all Virtual Warehouses that share metadata. Metadata is refreshed/invalidated in parallel across all your Virtual Warehouses. You need to run the commands only once in any one of your Virtual Warehouse.
To get this feature, you must upgrade your Virtual Warehouse to this release 1.6.1-b258 that
has runtime 2023.0.13.0-20 (released Feb 7, 2023). You must also set the Impala catalogd
enable_reload_events to true. Newly created Virtual Warehouses
use Impala version 2023.0.13.0-20, which has this feature enabled by default. For more
information, including how to disable this feature, see Disabling metadata synchronization.
This enhancement does not synchronize metadata when you refresh tables or invalidate metadata from a Data Hub cluster.
Apache Iceberg GA in CDW
This release introduces the general availability of ACID transactions with Iceberg v2 tables from Hive in CDW Runtime 2023.0.13.0-20 (released 2023-2-7)). CDW Runtime 2022.0.12.0-90 (released 2022-12-13), introduced the general availability of ACID transactions with Iceberg V2 tables from Impala. You can run Apache Iceberg ACID transactions within some of the key data services in the Cloudera Data Platform (CDP) public cloud (AWS and Azure), including Cloudera Data Warehouse. From Hive or Impala, you use Apache Iceberg features in CDW, which include time travel, create table as select, and schema and partition evolution.
To access these features, create a new Virtual Warehouse or upgrade an existing one.
Support for Iceberg tables in Avro (Preview)
In this release, you can read Iceberg tables in Avro from Impala. There is a related known issue with using the DECIMAL data type in Avro this release.
Reading Iceberg tables in Avro format from Impala is available as a technical preview. Cloudera recommends that you use this feature in test and development environments. It is not recommended for production deployments.
Enhanced Iceberg support for materialized views
In this release, you can create a materialized view of an Iceberg V1 or V2 table based on an existing Hive table or an Iceberg table. Automatic rewriting of the materialized view occurs under certain conditions.
Iceberg load data inpath feature
From Impala, you can now load data into an Iceberg table using the load data inpath feature.
Querying Data Hub Kudu tables from an Impala Virtual Warehouse using Kudu
After configuring an Impala Virtual Warehouse to connect to Kudu, you can create Kudu tables using Impala clients, such as Hue, the Impala shell, or JDBC/ODBC. You can also ingest data using Spark/NiFi and query using Impala.
Flexible allow lists for Kubernetes cluster and load balancer
- Enable IP CIDR for Kubernetes cluster
- Enable IP CIDRs for the load balancer
This release of Cloudera Data Warehouse includes the following new Impala features:
- Binary support: Impala now supports BINARY columns for all table formats except Kudu. See the BINARY support topic for more information on using this arbitrary-length byte array data type in CREATE TABLE and SELECT statements.
- ALTER VIEW support: Before this release, altering only the VIEW definition, VIEW
name, and owner was supported. Impala now supports altering the table properties of a VIEW by using
- Push down date literals to Kudu scanner: Impala now allows creating and pushing down Kudu predicates from the DATE type.
select * from functional_kudu.date_tbl where date_col = DATE "1970-01-01"; PLAN-ROOT SINK | 00:SCAN KUDU [functional_kudu.date_tbl] kudu predicates: date_col = DATE '1970-01-01' row-size=12B cardinality=1 ---- DISTRIBUTEDPLAN PLAN-ROOT SINK | 01:EXCHANGE [UNPARTITIONED] | 00:SCAN KUDU [functional_kudu.date_tbl] kudu predicates: date_col = DATE '1970-01-01' row-size=12B cardinality=1
- Fix untracked memory in KRPC: Improved memory estimation of queries by accounting for untracked memory in KrpcDataStreamSender.
- Redhat UBI8 images: To address multiple CVEs, Impala images are built using UBI8 base image.