CDP Public Cloud: August 2022 Release Summary

Data Visualization

Cloudera Data Visualization (CDV) 7.0.2 was released on August 9, 2022 introducing the following changes:

  • It is now possible to search for dashboards, applications, datasets, or documentation from anywhere in the application.
  • Several improvements to filters and filtering: Improved display of the user-added values in filters. Relative date filters are now included in URL parameters and email jobs. NULL values can be included or excluded in date filters. Filter width settings can now be adjusted as expected.
  • New site setting to allow administrators to disable manual user creation from within the application. This helps preventing username collisions between SSO users and manually created users.
  • The workspace description is now shown on hover when viewing the list of workspaces from the Visuals tab.
  • A “Copy to Clipboard” button was added to the Get URL modal to make sharing parameterized links easier.

Data Warehouse

This release of the Cloudera Data Warehouse (CDW) service on CDP Public Cloud introduces these changes:

Azure AKS 1.22 provisioning

In this release 1.4.2-b118 (released August 4, 2022), when you activate an environment, CDW automatically provisions AKS 1.22.

Do not upgrade your AKS cluster to 1.22 in this release using the Azure CLI. Doing so can cause the cluster to become unusable and can cause downtime. For more information about upgrading, see Upgrading an Azure Kubernetes Service cluster for CDW.

Azure AKS managed identity support

You can now specify a user-assigned, managed identity for the AKS cluster when you activate the Azure environment.

Support for fine-grained access using minimum permissions in Azure environments

You can configure minimum permissions to govern access control between CDW, Azure resources, and the Azure storage account.

Azure Kubernetes Service deployment configuration options

Changes in configuration properties simplify configuring a DNS zone for AKS.

Cluster subdomain definition improvement

In release version 1.4.2-b118 (released August 4, 2022), if you have the CDW_CUSTOM_CLUSTER_ID entitlement, you can define a cluster subdomain to retain your JDBC URLs. When you activate an environment, define your old subdomain using the following format:

env-xxx.dw

Defining the old subdomain in this way retains your old Virtual Warehouse names in the cluster. During environment activation, your old URLS continue to work, including JDBC URLS, Hive and Impala Virtual Warehouse URLS, Grafana URLS, and other URLS.

The new way of defining your old subdomain replaces the behavior in effect for Versions 2021.0.1.-b10 (released August 27, 2021) through 1.4.1-b86 (released June, 22, 2022). For information about the old way of defining a cluster domain, and the JDBC URL changes you had to made, see the August 27, 2021 release notes.

Simplification of Azure activation UI from CDW

Azure documentation for Data Warehouse has been updated to reflect improvements in the UI:

Apache Iceberg GA in CDW

The following new features are generally available (GA) in Iceberg V1 in CDW:

Impala BI features were released on a GA basis already.

Impala enhancements

  • Printing query results in vertical format Impala-shell now includes a new command option ‘-E’ or ‘–vertical’ to support printing of query results in vertical format.

  • Resolving ORC columns by names Before this release, Impala resolved ORC columns by index. In this release, a query option ORC_SCHEMA_RESOLUTION is added to support resolving ORC columns by name.

  • Retrieving the data file name Impala now supports including a virtual column in a standard SELECT statement select INPUT__FILE__NAME from to [retrieve the name of the data file](/cdw-runtime/cloud/impala-sql-reference/topics/impala-virtual-columns.html) that stores the actual row in a table.

  • Consolidating the Ranger audit logs for the same table Impala now consolidates the Ranger audit log entries of column accesses granted by the same policy for columns in the same table, after all the requests for accessing an object are processed.

  • BYTES function support Impala now supports the BYTES() function. This function returns the number of bytes contained in a byte string.

  • Configurable socket timeout for http transport Impala supports configuring socket timeout using the new impala-shell config option –http_socket_timeout_s. When you set a reasonable timeout, an impala-shell client can retry in case of connection issues.

  • UTF-8 mode support Some Impala STRING types now support UTF-8 aware behavior to ensure consistent results for non-ASCII characters in the string in both Hive and Impala.

DataFlow

This release (2.1.0-h1-b10) of Cloudera DataFlow (CDF) on CDP Public Cloud introduces improvements for Prometheus performance, reliability and performance improvements for upgrades, as well as a number of additional bug fixes.

  • Fixed regression in DFX 2.1.0 where new deployments contain NiFi pods with misconfigured cpu/memory limits. These limits are lower than necessary, leading to the possibility of NiFi pod crashing due to off-heap storage growing large enough to cause an Out of Memory error.
  • Fixed a bug where disabling DataFlow from certain failure states can potentially leave behind some cloud resources deployed by DataFlow.
  • Several settings for Prometheus instances running on the workload cluster have been adjusted to improve performance. This includes a reduction in the amount of metrics collected by Prometheus instances and enabling write ahead log compression.
  • Improved Enable, Upgrade/rollback and Create deployment performance to more reliably interpret status of helm releases and NiFi resources for improved success rates.
  • Fixed an issue where cluster node scaling events are not being properly surfaced to the user.
  • Fixed a bug where upgrades failed if a deployment was created with autoscaling disabled, but since was updated in any way (node count, params, and so on).
  • Improved upgrade timeout configurations to account for delays upgrading clusters with high node counts.
  • Fixed a bug causing some environments not to show in the filter options on the Deployment Dashboard.
  • Fixed potential for up to 10-minute delay during upgrades where helm repository sync is unnecessarily delayed. This could sometimes lead to timeout failures during upgrade.
  • Fixed a bug in the user interface on the Deployment Dashboard when an environment filter option is selected, but becomes no longer valid.
  • Fixed a bug in retrying a failed DataFlow enablement where a new enable attempt fails fast and old infrastructure is eventually deleted. Old infrastructure details are not cleared from the database, so subsequent retries attempt and fail to clean up infrastructure that is already deleted.
  • Fixed an issue where in-cluster certificates are renewed, but certain cluster resources are not automatically restarted to start using the new certificates.
  • Fixed an issue limiting upload of flow definition files to the catalog to 1MB in size. Flow uploads are now supported up to 25MB in size.
  • Fixed an issue preventing creation of deployments with large flow definitions.