CDP Public Cloud: March 2022 Release Summary

Data Catalog

Data Catalog introduces the following additions:

  • Support for Cloudera Runtime 7.2.14
  • No end-user manual process needed for backing up and restoring profiler data
  • Post-upgrade (to Cloudera Runtime 7.2.14 version), deleting profiler cluster does not require saving the modified Custom Sensitivity Profiler rules

Data Hub

Support for new GCP instance types

The following new GCP instances are supported in Data Hub:

  • n2d-highmem-32
  • n2d-highmem-64

Public certificate auto-renewal

Most public (Let’s Encrypt-issued) certificates for Data Lake and Data Hub clusters will now auto-renew without intervention from a user. For more information, refer to Managing Certificates.

Data Hub upgrades

Major/minor version upgrades of Cloudera Runtime and Cloudera Manager are generally available for most Data Hub templates. Data Hub maintenance upgrades for RAZ-enabled Data Hubs are generally available for versions 7.2.7+. For more information see Upgrading Data Hubs.

Data Visualization

The following new features and improvements have been implemented in this release:

  • New CML engine docker.repository.cloudera.com/cloudera/cdv/cmldataviz:6.3.7-b38 is available.
  • This release introduces an improvement to searching for specific values in a filter, and a loading icon will be displayed when the search is not yet complete.
  • Aggregate Display was renamed to Calculated Display, and introduced better difference calculating logic.
  • The Field Statistics visual is improved for Hive connections.
  • Data Model now prompts a reload of preview data after changes in dataset.
  • Performance improvements for visual and report metadata queries were added.
  • The SQL editor now includes line numbers for easier debugging of queries.
  • This release adds proxy username and attribute settings in Impyla based connectors (Hive, Impala, Arcengine V2).
  • Downloading a Crosstab visual as an Excel file should now display data in the same way as it appears in the UI.

Data Warehouse

Apache Iceberg technical preview in Cloudera Data Warehouse (Preview)

Cloudera Data Warehouse (CDW) supports Apache Iceberg as a technical preview. Iceberg is a cloud-native, open table format for organizing petabyte-scale analytic datasets on a file system or object store. In CDW, you can read and write Iceberg tables using Hive or Impala SQL engines. To use Iceberg in CDW, create a new Virtual Warehouse, or upgrade your existing one.

Automating metadata invalidation after compaction

After compaction of ACID tables, metadata in Impala coordinator local caches might be stale. Invalidation refreshes the metadata and prevents possible query failure. This release introduces automatic metadata invalidation, which you can configure in the Impala Virtual Warehouse. You must upgrade your Database Catalog to get this feature.

Cloudera Data Warehouse availability in ap-1 Control Plane region

Cloudera Data Warehouse (CDW) is now supported in the ap-1 (Australia) regional Control Plane. To use CDW in this regional Control Plane, your CDP administrator must create a new environment.

Data Visualization integration in Cloudera Data Warehouse

The status of Data Visualization integration in CDW is General Availability in this release. CDW integrates Data Visualization for building graphic representations of data, dashboards, and visual applications based on CDW data, or other data sources you connect to. Authorized users can explore data using graphics, such as pie charts and histograms and collaborate using dashboards. BI analysts who can access your environment can use these features. In a future release, the Enable Data Visualization option in the New Virtual Warehouse dialog will be deprecated.

Impala coordinator shutdown

To save cloud costs, you can configure Impala coordinators to automatically shutdown during idle periods. In this release, coordinator shutdown feature status has changed to General Availability. The feature is available when you create an Impala Virtual Warehouse.

Impala Virtual Warehouse High Availability

A single Impala coordinator might not handle the number of concurrent queries you want to run or provide the memory your queries require. You can configure multiple active-active coordinators to resolve or mitigate these problems. To use this feature, create a new Virtual Warehouse.

Impala late materialization of columns

The Data Warehouse Runtime release of Impala introduces late materialization, which optimizes certain queries on Parquet tables by limiting table scanning. Only relevant data is materialized to improve query response.

Istio version update

In this release, new environments you create support Istio 1.11.2. In new environments, you must select the latest Cloudera Data Warehouse image version 2021.0.6.96 when you create a Database Catalog and Virtual Warehouse, as described in Known issues.

Log4j Update

Cloudera has released a new version of CDW that upgrades the embedded Log4j version to 2.17.1.

Management Console

Upgrading FreeIPA

To ensure that your FreeIPA nodes are running with the latest patches, you should periodically upgrade your FreeIPA cluster. CDP currently allows you to upgrade all FreeIPA clusters, updating OS-level security patches on the cluster nodes. See Upgrade FreeIPA.

Upgrading the Data Lake

Major/minor version upgrades of Cloudera Runtime and Cloudera Manager are generally available. Data Lake maintenance upgrades for RAZ-enabled environments versions 7.2.7+ are generally available. For more information see Data Lake upgrade.

New classic cluster roles

As part of the new authorization model released in 2021, CDP introduces a new account role and resource roles related to classic clusters:

  Roles Description
New account role ClassicClustersCreator This role is required to register a new classic cluster. If this role is not present then the “Add Cluster” button is not visible to the user.
New resource roles ClassicClusterAdmin, ClassicClusterUser These roles can be assigned on the scope of a specific classic cluster.

For more information, see Enabling admin and user access to classic clusters.

Data Lake backup and restore options

New CLI options have been added to the Data Lake backup and restore feature. These options allow for explicitly including or skipping certain data during a backup and restore operation:

  • You can skip or include the backup/restore of the HMS and Ranger databases.
  • You can skip or include the HBase Atlas tables, and all Solr collections except ranger_audit.
  • You can skip or include the Solr ranger_audit collection.

For more information, see Configure backups for a Data Lake.

Public certificate auto-renewal

Most public (Let’s Encrypt-issued) certificates for Data Lake and Data Hub clusters will now auto-renew without intervention from a user. For more information, refer to Managing Certificates.

Data Lake recipes

Support for attaching and detaching recipes on a Data Lake cluster is now available through both the CDP UI and CDP CLI. For more information see Recipes.

Machine Learning

Project Creation Timeout settings

The Project Creation Timeout (minutes) settings on the Site Administration > Settings page now applies to project creation via both git clone and fork. Consider increasing this value in case you need to create a large project.

Load Balancer subnet

CML supports specifying subnets for the Load Balancer created by CML on AWS. The Workspace Provisioning page now has a field for specifying a subnet for the Load Balancer. See Provisioning a Workspace for more information.

Operational Database

COD provides enhanced user interface to download the Phoenix client jar

COD provides an enhanced user interface to download the Phoenix client jar from the Phoenix Thick and Phoenix Thin client tabs through a single click.

You can download the Phoenix client jars with a single click directly from the Phoenix Thick client and Phoenix Thin client tabs in the UI.

Replication Manager

Using CDP CLI for HDFS and Hive replication policies

The CDP CLI commands for Replication Manager are available under the replicationmanager CDP CLI option.

You can create, suspend, activate, or delete HDFS and Hive replication policies using create-policy, suspend-policy, activate-policy, and delete-policy CDP CLI commands. You can also list the clusters, replication policies, service statuses, credentials, and get the credentials for a specific cluster using the list-clusters, list-policies, list-cluster-service-statuses, list-all-credentials, and get-credentials CDP CLI commands.

For more information about CDP CLI for HDFS and Hive replication policies, see CDP CLI for Replication Manager.