CDP Public Cloud: September 2022 Release Summary

Data Engineering

This release (1.17) of the Cloudera Data Engineering (CDE) service on CDP Public Cloud introduces new features and improvements:

Service reference architecture

Reference architecture has been published to outline best practices on scaling CDE service, including when running Airflow bases pipelines. For more informarion, see Recommendations for scaling CDE deployments and Apache Airflow scaling and tuning considerations.

Kubernetes dashboard

CDE provides the option to view the Kubernetes dashboard to provide an easy user experience for monitoring your diagnostics. The dashboard is to be used when troubleshooting in coordination with Cloudera Support. For more information, see Accessing the Kubernetes dashboard.

Azure private storage

As of CDE 1.16, Azure private storage is supported. Details around deploying and configuring CDE with Azure private storage are now available. For more information, see Supporting Azure private storage.

SSL support for Azure DB

For increased security, CDE on Azure will now deploy SSL enabled with TLS 1.

Data Hub

This release of the Data Hub service introduces new features:

Database upgrade and default major version change

Newly deployed Data Lake and Data Hub clusters with Cloudera Runtime 7.2.7 or above are now configured to use a PostgreSQL version 11 database by default.

A new mandatory Database Upgrade capability is now available for existing Data Lake and Data Hub clusters. If you are running clusters on Cloudera Runtime version 7.2.6 or below, please upgrade to a more recent version before performing the database upgrade.

The major version of the database used by Data Lake or Data Hub clusters is now also displayed on the Database page of the respective service.

Cloudera strongly recommends that the Database Upgrade is performed on all clusters running PostgreSQL version 10 before November 10, 2022. For more information see Upgrading the Data Lake/Data Hub database.

Recipe type changes

The following recipe types have been renamed for Data Hub, Data Lake, and FreeIPA recipes:

  • pre-service-deployment (formerly pre-cluster-manager-start)
  • post-service-deployment (formerly post-cluster-install)

These changes will not affect existing recipe automation. For more information see Recipes.

### Instance types

The following new instance types are supported in Data Hub:

  • m5dn.xlarge
  • m5dn.2xlarge
  • m5dn.4xlarge
  • m5dn.8xlarge
  • m5dn.12xlarge
  • m5dn.16xlarge
  • m5dn.24xlarge

Validate and prepare for upgrade

Before you perform a Data Hub upgrade, you can run the new Validate and Prepare option to check for any configuration issues and begin the Cloudera Runtime parcel download and distribution. Using the validate and prepare option does not require downtime and makes the maintenance window for an upgrade shorter. For more information see Preparing for a upgrade.

Data Visualization

Cloudera Data Visualization (CDV) 7.0.3 was released on September 30, 2022, introducing the following changes:

  • Daterange filters can be set to default to the max date rather than a fixed date.
  • CSV and Excel downloads show additional information when download or view fails because of global download row or size limit.
  • Dataset field names no longer allow parenthesis to avoid potential issues if used in filters.
  • Added settings to fine-tune the appearance of the alert when a query is taking an unusually long time, and enable/disable alerts for this purpose in View mode.
  • Query canceling is now immediately visible for each visual as it loads.
  • CDW Impala and CDW Hive connections now include Client Identifier and Application Name with a default of ‘viz’. This is configurable from connection settings.
  • It is now possible to filter for only the rows that have NULL for a particular field value.
  • Improvements for the Packed Bubbles visual type, including better handling of dimensions placed on the Colors shelf and enablement of legend.
  • Improvements for resizing visuals and dashboard width in Edit mode.
  • Improvements to the Crosstab visual type, including aligning the totals row, showing adjacent duplicate values, and allowing text selection in View mode.

For fixed issues, see Fixed issues in Cloudera Data Visualization 7.0.3.

Data Warehouse

This release of the Cloudera Data Warehouse (CDW) service on CDP Public Cloud introduces these changes:

Support for Amazon Elastic Kubernetes Service 1.21 cluster

This release 1.4.3-b225 (released September 15, 2022) automatically uses and provisions Amazon Elastic Kubernetes Service (EKS) 1.21 when you activate an environment from CDW. In this release, upgrading a cluster from EKS 1.20 to 1.21 is not supported.

Support for Azure Kubernetes Service 1.24 cluster

This release 1.4.3-b225 (released September 15, 2022) supports the Azure CLI to upgrade the AKS cluster to/from the following AKS versions only:

  • Upgrade to AKS 1.24
  • Upgrade from AKS 1.22 (running on 1.4.2 (released August 1, 2022).

Upgrading from environments earlier than AKS 1.22 is not supported. Cloudera does not plan to support upgrading from environments earlier than 1.4.2.

Alert notification configuration

You can now configure alert notifications that appear in user notification system. The notifications supplement alert information in charts on the Manage/Hive/Compaction Observability dashboard in Grafana.

Impala Virtual Warehouse catalog high availability

In earlier releases, the single instance limitation of the Impala Virtual Warehouse catalog made the catalog a single point of failure. In this release, you can optionally configure replication of the Impala catalog for high availability. One instance operates in active mode, the other in passive mode to provide failover. The passive instance serves as a backup and takes over if the active instance goes down.

General Availability of Managed Storage Access

CDW capabilities for storing data for multiple tenants is now generally available (GA). You set up a managed storage warehouse for AWS or for Azure.

General availability of SSO to Virtual Warehouses

In this release, the capability to enable SSO (single sign-on) to your Virtual Warehouse from JDBC/ODBC clients is generally available (GA).

Hive to Iceberg table migration disallowed under certain conditions

If you attempt to migrate a Hive table having VARCHAR or CHAR columns to Iceberg, an exception will occur in this release. Migration under these conditions is not allowed in this release because incorrect query results can occur. To work around this limitation, change VARCHAR and CHAR columns to string columns, and then migrate the table to Iceberg.

Impala Autoscaler web UI

This release includes an addition to the Impala Web UI for debugging the Impala Virtual Warehouse. The new UI provides insight into Autoscaler operations, accessing log messages, and resetting the log level.

Additions to IAM policy, standard and restricted, permissions

In this release, if you want to use Amazon CloudWatch, add logging permissions: to your IAM policy: “logs:CreateLogGroup”,”logs:CreateLogStream”,”logs:DescribeLogStreams”, and “logs:PutLogEvents”. The arn:aws:iam::aws:policy/CloudWatchAgentAdminPolicy must also be added to the standard and restricted permissions.

Using CloudWatch is not required. Add logging permissions for CloudWatch only if you use this Amazon service. For more information about adding the CloudWatch logging permissions, see the following documentation:

CREATE TABLE LIKE PARQUET FILE feature in Unified Analytics

You can base a new table on a schema in a Parquet file using the CREATE TABLE LIKE PARQUET FILE statement.

Cloudera DataFlow

This release (2.2.0-b194) of Cloudera DataFlow (CDF) on CDP Public Cloud improves the first time user experience, introduces more ReadyFlows, and supports renewing certificates for Inbound Connection.

Streamlined first time user experience

You can use the new Hello World ReadyFlow to create your first flow deployment without dependencies on source or target systems. New landing pages for the Dashboard and Catalog allow you to get started with just one click.

  • Non-CDP ADLS to S3/ADLS
  • Non-CDP S3 to S3/ADLS
  • ListenTCP filter to S3/ADLS
  • ListenSyslog filter to S3/ADLS

Endpoint Hostnames certificates can now be renewed when using Inbound Connections

For fixed issues in this release, see Fixed issues.

This release (2.2.0-h1-b11) of Cloudera DataFlow (CDF) on CDP Public Cloud introduces fixes for certificate renewal issues that could cause the deployment manager and NiFi canvas to become inaccessible.

For fixed issues, see Fixed issues.

This release (2.2.0-h2-b1) of Cloudera DataFlow (CDF) on CDP Public Cloud introduces fixes improving performance of control plane database queries.

Cloudera Machine Learning

This CML release (2.0.32) includes the following new features and improvements:

Suspend/Resume Workspace

Administrators can optimize cloud costs by suspending CML Workspaces for weekends and resume operation for business hours.

Iceberg support

CML Snippets now fully support the Iceberg v1 table format for all Spark, Hive, and Impala data connections. See Create an Iceberg data connection for details.

Data connections

Data Hubs are now automatically discovered and connections are created for them in newly-created CML Workspaces.

Backup/restore workspace

The default timeout is now 12 hours, and the estimated time to complete a backup (from the cloud provider) is now periodically added to the event logs.

Management Console

This release of the Management Console service introduces the following changes:

Database upgrade and default major version change

Newly deployed Data Lake and Data Hub clusters with Cloudera Runtime 7.2.7 or above are now configured to use a PostgreSQL version 11 database by default.

A new mandatory Database Upgrade capability is now available for existing Data Lake and Data Hub clusters. If you are running clusters on Cloudera Runtime version 7.2.6 or below, please upgrade to a more recent version before performing the database upgrade.

The major version of the database used by Data Lake or Data Hub clusters is now also displayed on the Database page of the respective service.

Cloudera strongly recommends that the Database Upgrade is performed on all clusters running PostgreSQL version 10 before November 10, 2022. For more information see Upgrading the Data Lake/Data Hub database.

FreeIPA recipe support and recipe type changes

You can register and attach recipes to run on a specific FreeIPA host group. For more information, see Recipes.

The following recipe types have been renamed for Data Hub, Data Lake, and FreeIPA recipes:

  • pre-service-deployment (formerly pre-cluster-manager-start)
  • post-service-deployment (formerly post-cluster-install)

These changes will not affect existing recipe automation.

Validate and prepare for upgrade

Before you perform a Data Hub upgrade, you can run the new Validate and Prepare option to check for any configuration issues and begin the Cloudera Runtime parcel download and distribution. Using the validate and prepare option does not require downtime and makes the maintenance window for an upgrade shorter. For more information see Preparing for a upgrade.

AWS Hong Kong region

You can now register a CDP environment and create Data Hubs in the AWS Hong Kong Region (ap-east-1). See updated Supported AWS regions.