CDP Public Cloud: June 2023 Release Summary

Data Engineering

This release (1.19.2) of the Cloudera Data Engineering (CDE) service on CDP Public Cloud introduces a new improvement that is described in this topic.

Kubernetes update

CDE now supports K8s 1.25 for Azure and AWS.

Support for new AWS region

CDE now supports the EU Milan (eu-south-1) AWS region.

Support for user defined routing (UDR)

CDE now supports UDR when you enable a CDE service for Azure. For more information, see Enabling a Cloudera Data Engineering service.

Support for more AMD instances

CDE now includes more AMD instances for the Workload Type drop-down menu when you enable a CDE service.

DataFlow

This release (2.5.0-b210) of Cloudera DataFlow (CDF) on CDP Public Cloud introduces NiFi 1.21, flow metrics driven auto-scaling, advanced UIs for JoltTransformJSON and UpdateAttribute processors, bulk actions in the Flow Designer, UDR support on Azure and makes in-place upgrades generally available to all customers.

New features and changes

  • Flow Deployments and Test Sessions now support the latest Apache NiFi 1.21 release.

  • In-place upgrades are now generally available to all customers. For more information, see Service upgrade.

  • Flow Deployments now support Flow Metrics Scaling in addition to CPU utilization based auto-scaling. For more information, see Auto-scaling flow deployments.

  • When selecting Private Cluster during enablement on Azure, CDF no longer provisions a public load balancer and now supports User Defined Routes (UDR).

  • Users can now provide their own values for the Kubernetes Pod CIDR Range and Kubernetes Service CIDR Range during enablement.

  • CDF now inherits load balancer configuration including available subnets from its associated CDP environment during the enablement process.

Flow Designer

  • The Flow Designer now supports multi-selection on the canvas and bulk actions for Start, Stop, Enable, Disable, Move, Change parent group, Copy/Paste, and Delete.

  • The Flow Designer now supports the advanced configuration UI for UpdateAttribute.

  • The Flow Designer now supports the advanced configuration UI for JoltTransformJson.

  • The Flow Designer now supports Birdseye and Zoom controls.

  • The Flow Designer now supports Processor Diagnostics with an active Test Session.

ReadyFlows

The following new ReadyFlows have been added to the ReadyFlow Gallery:

  • CDW Ingest

  • CDP Kafka to Snowflake

  • Slack to S3

  • Updated Confluent Cloud to Snowflake using new Snowpipe processors

Management Console

This release of the Management Console service introduces the following changes:

GCS fine-grained access control (Preview)

You can now register a CDP environment on GCP with RAZ enabled to use fine-grained access policies and audit capabilities available in Apache Ranger. See GCS Fine-Grained Access Control.

Note: You need to contact Cloudera to have this feature enabled.

Runtime 7.2.17

Runtime 7.2.17 is now available and can be used for registering an environment with a 7.2.17 Data Lake and creating Data Hub clusters. See Cloudera Runtime.

Rolling Data Lake upgrades

With the release of Cloudera Runtime 7.2.17, you can now upgrade your Data Lake in a rolling manner. The rolling upgrade allows you to upgrade the Data Lake Runtime and OS without stopping attached Data Hubs or Data Services. This allows workloads to continue running during the Data Lake upgrade operation. You need to contact Cloudera to have this feature enabled. See Rolling Data Lake Upgrades for more information.

Zone selection within a GCP region

During GCP environment registration you can now select a zone within the selected GCP region. For example, if you selected the us-west1 region, you can select one of its three zones: us-west1-a, us-west1-b, or us-west1-c.

CDP credit consumption and usage insights

CDP includes a user interface that allows you to monitor your credit consumption and download your consumption records. See CDP credit consumption and usage insights.

Cross-version compatibility for Data Lake backup and restore

You can take a backup of a Data Lake that runs one version of Cloudera Runtime and restore the backup to a Data Lake that runs a different version of Runtime. The backup version must be an earlier/lower version Runtime than the Data Lake that you are restoring to. Version limitations apply and a Ranger/HMS schema upgrade may be required. See Cross-version support for Data Lake backup and restore for more details.

Cross-version compatibility for Data Lakes and Data Hubs

Backward compatibility between Data Lakes and Data Hubs has been introduced with Cloudera Runtime 7.2.17. It is no longer required that you perform Data Hub upgrades in lock-step with the Data Lake upgrade. If your Data Hub is on Runtime version 7.2.16 or later, it is compatible with a Data Lake on a newer Runtime version (7.2.17+). You can independently upgrade your Data Hubs at a later time if you choose to, though it is not required.

AWS Middle East UAE region

You can now register a CDP environment and create Data Hubs in the AWS Middle East UAE region (me-central-1). See updated Supported AWS regions.

Data Lake upgrade validations for Python dependency

If you are planning to upgrade the Runtime version in your existing Data Lake clusters to 7.2.17 or higher versions, you may be required to perform an additional step before upgrading.

You can verify whether your cluster is impacted by navigating to the Upgrade tab of your Data Lake cluster. If there is a warning message about missing prerequisites on the Upgrade page, you need to perform an additional upgrade step before moving to the 7.2.17 Cloudera Runtime version. The required steps may be different depending on your current Runtime version. See Data Lake upgrade for more details.

Operational Database

Cloudera Operational Database (COD) 1.32 version provides enhancements to the CDP CLI as well as on COD UI.

UI enhancements to the Scale option on the database creation page

On the COD UI, when you create an operational database the Medium Duty is renamed to Heavy Duty under Create Database > Scale. This ensures that the options on COD UI and CDP CLI Beta are symmetrical. For more information, see Creating a database using COD.

Enhancements to the CDP CLI option –scale-type

In CDP CLI, when you select the --scale-type option as HEAVY, COD allocates larger SSD storage (for example, gp2 on AWS, StandardSSD_LRS on Azure, or pd-ssd on GCP) for both master and leader node types. This ensures the higher loads on Zookeeper and provides a better performance for COD. For more information, see CDP CLI Beta.

Replication Manager

This release of the Replication Manager service introduces the following new features:

HBase replication policy enhancements

You can view the HBase RegionServer metrics for a specific replication peer in Replication Manager. You can use these metrics to monitor HBase replication jobs and to find and diagnose issues with the HBase replication peer. For more information, see Graphical representation of HBase RegionServer replication peer metrics.

CDP CLI commands for Replication Manager

You can collect and download the diagnostic bundle, add ABFS and AWS cloud credentials to use in Replication Manager and delete credentials using CDP CLI commands. For more information, see Adding cloud credentials in Replication Manager using CDP CLI.