What's new in Cloudera Data Warehouse on Public Cloud
Review the new features introduced in this release of Cloudera Data Warehouse (CDW) service on CDP Public Cloud.
- CDW features
- CDW on Azure environments
- CDW on AWS environments
- Iceberg
- Hue
- Technical Preview features
- Behavior changes
What's new in CDW Public Cloud
- General availability of Virtual Warehouse and Database Catalog workload version selections
- The CDW UI now provides a list of workload versions that match your cluster from which you can select one during cluster installation. The Database Catalog list contains versions compatible with your Kubernetes version and your cluster environment (DWX version). The Virtual Warehouse list contains versions compatible with your Kubernetes version, your cluster environment (DWX version), and your Database Catalog version.
- General availability of Impala workload-aware autoscaling
- Workload-aware autoscaling allocates Impala Virtual Warehouse resources based on the workload that is running. You choose multiple executor group sets size based on your workload requirements, instead of the fixed executor group size of the previous auto-scaling implementation. This feature is now generally available. See Workload Aware Auto-Scaling in Impala.
- Improved Impala Autoscaling Dashboard
- You can now use the new Impala Autoscaling Dashboard to monitor Impala autoscaling in a warehouse that uses workload-aware autoscaling or the regular autoscaling. You can access the Impala Autoscaling Dashboard by going to the Virtual Warehouse Details page's Web UI tab, and clicking the Impala Autoscaler Web UI option. See About the Impala Autoscaling Dashboard.
- Ability to forward Prometheus metrics from CDW to an external endpoint
- In this release, you can configure Prometheus in CDW to push its metrics to an external endpoint, such as Prometheus, Grafana, Thanos, or some other endpoint. See Forwarding Prometheus metrics from CDW to an endpoint.
- Automatically backing up and restoring CDW
- This release adds more automation to back up and restore procedures for AWS and Azure
environments and clarifies the documentation of the automatic, semi-automatic, and manual
procedures.
To get the supported Kubernetes version for this release, you back up your old AWS or Azure environment and start up a new environment using the restoration process. The backup/restore feature saves your environment parameters, making it possible to recreate your environment with the same settings, URL, and connection strings you used in your previous environment.
- Ability to configure Impala Statestore high availability
- You can now configure high availability for Impala Statestore pods in a Virtual Warehouse, with active and passive modes ensuring continuity and reliability during failovers. See Configuring Impala Statestore high availability.
- Downloading the UDF development package from CDW UI
- Introducing the ability to download the Impala UDF development package directly from the CDW UI for enhanced convenience and integration, see Building and deploying UDFs.
- PostgreSQL replaces SQLite database for Grafana in Cloudera Data Warehouse on Public Cloud
- The file-based SQLite database for Grafana has been replaced with PostgreSQL database, providing a more robust experience. You must deactivate and reactivate your environment in CDW to use this feature.
What's new in CDW on Azure environments
- Azure AKS 1.29 upgrade
- Cloudera supports the Azure Kubernetes Service (AKS) version 1.29. In 1.9.1-b233 (released July 26, 2024), when you activate an environment, CDW automatically provisions AKS 1.29. To upgrade to AKS 1.29 from an earlier version of CDW, you must backup and restore CDW. To avoid compatibility issues between CDW and AKS, upgrade to version 1.29.
- Addition of new Azure instance types
- This release offers the selection of the Standard_E16pds_v5 Azure Virtual Machine, an AKS Ampere® Altra® Arm-based instance type for an Impala Virtual Warehouse. For more information about using the instance type, see Activating an Azure environment from CDW.
- CDW provisions Azure Database for PostgreSQL - Flexible Server
- Starting with this release, CDW provisions Azure Database for PostgreSQL - Flexible Server instead of Azure Database for PostgreSQL - Single Server. See Enabling a private CDW environment.
What's new in CDW on AWS environments
- Amazon EKS 1.29 upgrade
- Cloudera supports the Amazon Elastic Kubernetes Service (EKS) version 1.29. In 1.9.1-b233 (released July 26, 2024), when you activate an environment, CDW automatically provisions EKS 1.29. To upgrade to EKS 1.29 from an earlier version of CDW, you must backup and restore CDW. To avoid compatibility issues between CDW and EKS, upgrade to version 1.29. See Upgrading Amazon Kubernetes Service (EKS).
- Note about the impact of AWS RDS root certificate rotation in 2024
- A CDW Cluster RDS does not use certificate verification for connections to the CDW.
Therefore you are not directly impacted by certificate expiration for your CDW Cluster RDS.
You can either choose to clear the warnings or rotate the certificate.
To rotate the certificate for the CDW Cluster RDS, follow the step outlined by AWS in Rotating your SSL/TLS certificate to update the certificate. There should be no impact on CDW because the CDW Cluster RDS should not be restarted, Postgres RDS has
SupportsCertificateRotationWithoutRestart=true
.For the Datalake RDS, follow instructions shared by the Datalake account team to update the certificate. There maybe some impact to CDW while restarting the Datalake, such as query failures or delays. This could happen because services such as Ranger, Knox, and FreeIPA might be unavailable during this period.
- Addition of new AWS instance types
- This release offers the selection of the r6gd.4xlarge and r7gd.4xlarge Arm-based instance types for an Impala Virtual Warehouse. For more information about using the instance type, see Activating an AWS environment from CDW.
- Ability to use envelope encryption for EKS secrets
- Envelope encryption is now added for EKS Secrets through CDW KMS Key by default. See Encrypt Kubernetes secrets with AWS KMS on existing clusters.
What's new in Iceberg on CDW Public Cloud
- CDP support for Iceberg version 1.4.3
- The Apache Iceberg component has been upgraded from 1.3.0 to 1.4.3.
- Support for Iceberg data compaction
- You can compact Iceberg tables and optimize them for read operations from Hive and Impala. Compaction is an essential table maintenance activity that creates a new snapshot, which contains the table content in a compact form. See Iceberg data compaction.
- SQL support for querying Iceberg metadata tables
- Apache Iceberg stores extensive metadata for its tables. From Hive and Impala, you can query the metadata tables as you would query a regular table. For example, you can use projections, joins, filters, and so on. See Query metadata tables feature.
What's new in Hue on CDW Public Cloud
- General availability (GA) of the SQL AI Assistant
- Hue leverages the power of Large Language Models (LLM) to help you generate SQL queries
from natural language prompts and also provides options to optimize, explain, and fix queries,
promoting efficient and accurate practices for accessing and manipulating data. You can use
several AI services and models such as OpenAI’s GPT service, Amazon Bedrock, and Azure’s
OpenAI service to run the Hue SQL AI assistant.
- To learn more about the supported models and services, limitations, and what data is shared with the LLMs, see About the SQL AI Assistant in CDW.
- To set up and enable the SQL AI Assistant, see About setting up the SQL AI Assistant in CDW.
- To see how to generate, edit, explain, optimize, and fix queries, see Starting the SQL AI Assistant in Hue.
- Introduction of task server in Hue and significant improvement in the file upload functionality
- A new Task Server page has been added to the Hue web interface. The
Hue task server enables the following functionalities:
- It improves the file-upload experience, allowing you to upload multiple files up to 5 GB each in parallel.
- It helps you to schedule tasks to clean up Hue documents and the /tmp directory, improving cluster maintenance experience and performance.