April 27, 2022

This release of the Cloudera Data Warehouse (CDW) service on CDP Public Cloud introduces these changes.

Metering fix

In this release, an important fix related to Cloudera Data Warehouse (CDW) metering has been added. This fix accurately communicates the status of your Virtual Warehouses to the Cloudera control plane, necessary for proper functioning of CDW. To include this fix, upgrade your Virtual Warehouses. Cloudera recommends performing this upgrade as soon as possible.

Unified Analytics

This is the GA release of Unified Analytics that brings SQL equivalency without syntax changes to CDW SQL engines. Unified Analytics brings significant optimization equivalency to these engines, unifying common techniques such as subquery processing, join ordering, and materialized views. Unified Analytics is available as a CDW Kubernetes service in AWS. Unified Analytics documentation includes features, limitations, and how to use Unified Analytics. To take advantage of this feature, create a new Impala Virtual Warehouse and enable the Unified Analytics option, or simply create a new Hive Virtual Warehouse. For more information, see this community blog.

Ability to enable a private CDW environment in Azure Kubernetes Service (AKS)

You can now activate CDW clusters in private mode which enables you to create all cloud resources without the need of public IP addresses. For more information, see Enabling a private CDW environment in Azure Kubernetes Service.

Downloadable link for Beeline CLI tarball

You can now download the Beeline CLI tarball from CDW and use the Beeline client to connect to a Hive Virtual Warehouse and run queries. The archive file contains all the dependent JARs and libraries that are required to run the Beeline script. Upgrade your existing Virtual Warehouse to use this feature.

Support for uploading non-UDF JARs

As part of this release, CDW enables administrators to upload additional non-UDF JARs to the Hive classpath that might be required to support dependency JARs, third-party Serde, or any Hive extensions.

Enhanced complex types support for Impala

This release allows you to use:
  • Array in SELECT list
  • UNNEST function for arrays in SELECT list

See Querying arrays for more information.

Ability to run Hive LLAP workloads using Hue without the “CDW_HUE_LLAP” entitlement

You can now use Hue to run Hive workloads in CDW without enabling the “CDW_HUE_LLAP” entitlement. Hue packs the combined features of Data Analytics Studio (DAS) and Hue in this new version. If you do not have the “CDW_HUE_LLAP” entitlement enabled in your environment, then you can get Hue by creating a new Hive Virtual Warehouse.

Ability to select the Hue image version

CDW allows you to select a Hue Runtime version from the Hue Image Version drop-down menu while creating a Virtual Warehouse. The compatible Hue image version is automatically selected based on the Hive or Impala image version that you select. If there is no corresponding Hue version for the selected Hive or Impala version, then the latest Hue version is automatically selected. For example, if you select Hive or Impala version 2022.0.7.0-70, then 2022.0.7.0-70 is selected as the Hue image version.

You can also check the Hue image version from the Virtual Warehouse Details page.

Hue Query Processor runs at the Database Catalog level

The Hue Query Processor indexes and retrieves Hive query history by reading query event data from a Hadoop Compatible File System (HCFS), such as S3, and writing them into a relational database, such as PostgreSQL. Previously, the Hue Query Processor process ran at the Virtual Warehouse level. Therefore, there was one Query Processor pod running for each Virtual Warehouse associated with a Database Catalog. Now, the Query Processor pod runs at the Database Catalog level. This helps in reducing the read cost in the cloud.

A new tab called “Hue query processor” has been added under the Database Catalog > Edit > CONFIGURATIONS section. The “hue-query-processor.json” and “hue-event-processor.json” files are no longer available under the Virtual Warehouse > Edit > CONFIGURATIONS > Hue > Configuration files drop-down menu.

If you are on an older version of Virtual Warehouse, then you must create a new Database Catalog and Virtual Warehouse to activate this feature.

To tune the data refresh rate, see Configuring the Hue Query Processor scan frequency.

Ability to grant fine-grained access to S3 buckets from the S3 File Browser in Hue (Preview)

For better security and ease of use for users, you can create per-user home directories within your S3 bucket and grant fine-grained access to these user directories from the S3 File Browser in Hue. To enable fine-grained access from the Hue S3 File Browser, you must enable Ranger Authorization Service (RAZ) when you register your AWS environment with CDP, and add the path to your S3 bucket in the hue-safety-valve field in CDW.

Ability to grant fine-grained access to ADLS Gen2 containers from the ABFS File Browser in Hue (Preview)

For better security and ease of use for users, you can create per-user home directories within your ADLS Gen2 container and grant fine-grained access to these user directories from the ABFS File Browser in Hue. To enable fine-grained access from the Hue ABFS File Browser, you must enable RAZ when you register your Azure environment with CDP, and add the path to your ABFS container in the hue-safety-valve field in CDW.

Performance and security enhancements in Hue

Python 2 has reached the end of life and is no longer supported. Hue in CDW Public Cloud now uses Python 3 which makes use of critical bug fixes and Common Vulnerabilities and Exposures (CVE) fixes for many third-party software dependencies.

Following changes have been made in the Hue codebase in this release of CDW Public Cloud:
  • The Hue container now runs in the UBI8 image, which is upgraded from UBI7.
  • Python libraries have been upgraded from Python 2.7 to Python 3.8.
  • The Django server has been upgraded from version 1.11.29 to 3.2.12.
  • Python modules such as django-auth-ldap, django-axes, djangorestframework-simplejwt, Mako, Markdown, python-ldap, django-babel, django-mako, django-cors-headers, djangorestframework, eventlet, sqlparse, and so on have also been upgraded.
  • Hue now uses Gunicorn as a front-end server. Previously, Hue used the CherryPy server.

These upgrades bring in significant performance improvement and stability in query execution, uploading, and importing files to S3 or ABFS. Operating System, Python version, and Python module upgrades have resulted in a stable environment and fixed more than 800 security vulnerabilities.

Also, the UBI8 image uses the latest Apache HTTP Server (httpd). This no longer requires you to limit the fully qualified domain name of your Virtual Warehouse to 64 characters.

Hue images now run as non-root users.

Selecting an availability zone when creating a Virtual Warehouse

In AWS environments, you can now accept the default availability zone, or select your preferred zone when creating a Virtual Warehouse. To use this feature, create a new Virtual Warehouse.

AWS restricted permissions policy

As Administrator, to restrict access to your Cloudera Data Warehouse service, your IAM role must include the AWS restricted policy. You must add the policy to your IAM role before you activate the environment in Cloudera Data Warehouse to make the service available to users.