CDP Public Cloud: April 2024 Release Summary

The CDP Public Cloud Release Summary summarizes major features introduced in CDP Public Cloud Management Console, Data Hub, and data services.

Data Catalog

This release of the Data Catalog service provides you with a notable behavior change which you must note and act accordingly.

While upgrading your cluster from Cloudera Runtime version 7.2.17 to 7.2.18, and specifically during the OS upgrade step, the cluster goes into the failure state. The following message is seen:

NODE_FAILURE:

New node(s) could not be added to the cluster. Reason Please find more details on Cloudera Manager UI. Failed command(s): Start(id=1546339088): Failed to start role profc6cf3856-PROFILER_SCHEDULER_AGENT-484032cb8f17cacf9e684efe50 of service profiler_scheduler in cluster cdp-dc-profilers-258395ef.

Impact on Data Catalog profilers:

If the Data Hub is not created, then the Data Catalog profilers will not be created in Cloudera Runtime 7.2.18 version.

To overcome this scenario, you must use the following process to bring up the Data Catalog profilers in the Cloudera Runtime 7.2.18 version.

First you must delete your existing 7.2.17 clusters. For more information, see Deleting profiler cluster.

Next, after you upgrade to the 7.2.18 Data Lake, then you can launch the Data Catalog profilers. For more information, see Launch profiler cluster.

NOTE: There is no data loss expected on the users’ side or the Profiler analysis. However, the only loss that could be expected is related to the last runtime value of the profiler and the profiler run history. The Profiler Last Runtime history refers to the records of how many runs of the profiler are displayed on the history page. It includes information on whether the runs were completed successfully or resulted in failures.

Data Hub

This release of Data Hub introduces the following new features:

Runtime 7.2.18

Cloudera Runtime 7.2.18 is now available and can be used for registering an environment with a 7.2.18 Data Lake and creating Data Hub clusters. For more information about the new Runtime version, see Cloudera Runtime. If you need to upgrade your existing CDP environment, your upgrade path may be complex. To determine your upgrade path, refer to Upgrading to Runtime 7.2.18 documentation.

RHEL replaces CentOS as default OS

As of June 30, 2024, CentOS reaches End of Life (EOL), and consequently, Cloudera Runtime 7.2.18 supports RHEL 8 only. New deployments of Data Lakes and Data Hubs with Runtime 7.2.18 and upgrades to 7.2.18 are only possible with RHEL 8. Data Lake and Data Hub clusters running Runtime 7.2.17 support both CentOS 7 and RHEL 8. Earlier Runtime versions support CentOS 7 only. Cloudera will not publish any updates or fixes for CentOS-based images after June 2024.

As part of FreeIPA, Data Lake, and Data Hub upgrade, you have the option to upgrade the operating system (OS) on the virtual machines (VMs) from CentOS 7 to Red Hat Enterprise Linux 8 (RHEL 8). For more information, see Upgrading from CentOS to RHEL.

Disk vertical scaling in Azure (Preview)

Disk vertical scaling (that is, disk type change and resizing) is now supported by CDP for Data Lakes and Data Hubs running in Azure. Previously, only AWS support was available for this feature in CDP. For more information, see Disk Vertical Scaling - Disk Type Change and Resizing in AWS and Azure.

Note: You need to contact Cloudera to have this feature enabled.

Machine Learning

CML version 2.0.45-b54 introduces the following new features and improvements:

  • Model Registry - In-place upgrade support for Model Registry deployments. See Upgrade model registry.
  • AMP Restarts - You can now retry failed AMP deployment steps and continue the AMP setup to handle intermittent and configuration issues. See Restarting a failed AMP setup.
  • Static subdomain support - Static subdomain support for AMP-deployed applications. Git Branch support for new Projects - You can now specify Git branches and specific commits for new Projects and AMPs.
  • Default ML Runtimes configuration - Administrators can help streamline new Project creation by selecting a set of ML Runtimes to be available for user workloads.
  • Model metrics support - You can now monitor the model performance for models deployed from the Model Registry.
  • Team Deletion - Administrators can delete unneeded Teams using APIv2. API v2 - Improvements made to List Projects endpoint, and other changes to support file uploads.
  • Workspace - The Create Workspace flow was improved to add validation of endpoint access and provide diagnostic responses.
  • New Azure GPU instances: NC*A100, D16s v5, D8s v5
  • New AWS GPU instances: p5.48xlarge
  • Azure - Added support for CML workspaces in the Middle East (Qatar Central) region.
  • Runtimes - New Runtime Addons are released: HadoopCLI 7.2.16.600, HadoopCLI 7.2.17.300, Spark 2.4.8, Spark 3.2.3, Spark 3.3.0.

Management Console

This section lists major features and updates for the Management Console service.

Configuring a CMK for data encryption in Azure Database for PostgreSQL Flexible Server

You can optionally use a customer managed encryption key (CMK) for data encryption in the Azure Database for PostgreSQL Flexible Server database instance used by CDP. For more information, see Configuring a CMK for data encryption in Azure Database for PostgreSQL Flexible Server.

Disk vertical scaling in Azure (Preview)

Disk vertical scaling (that is, disk type change and resizing) is now supported by CDP for Data Lakes and Data Hubs running in Azure. Previously, only AWS support was available for this feature in CDP. For more information, see Disk Vertical Scaling - Disk Type Change and Resizing in AWS and Azure.

NOTE: You need to contact Cloudera to have this feature enabled.

Runtime 7.2.18

Cloudera Runtime 7.2.18 is now available and can be used for registering an environment with a 7.2.18 Data Lake and creating Data Hub clusters. For more information about the new Runtime version, see Cloudera Runtime. If you need to upgrade your existing CDP environment, your upgrade path may be complex. To determine your upgrade path, refer to Upgrading to Runtime 7.2.18 documentation.

RHEL replaces CentOS as default OS

As of June 30, 2024, CentOS reaches End of Life (EOL), and consequently, Cloudera Runtime 7.2.18 supports RHEL 8 only. New deployments of Data Lakes and Data Hubs with Runtime 7.2.18 and upgrades to 7.2.18 are only possible with RHEL 8. Data Lake and Data Hub clusters running Runtime 7.2.17 support both CentOS 7 and RHEL 8. Earlier Runtime versions support CentOS 7 only. Cloudera will not publish any updates or fixes for CentOS-based images after June 2024.

As part of FreeIPA, Data Lake, and Data Hub upgrade, you have the option to upgrade the operating system (OS) on the virtual machines (VMs) from CentOS 7 to Red Hat Enterprise Linux 8 (RHEL 8). For more information, see Upgrading from CentOS to RHEL.

Discontinuation of Medium Duty Data Lake

Starting with Runtime 7.2.18, Medium Duty Data Lake is discontinued and is replaced by Enterprise Data Lake (EDL). In order for existing Data Lakes to be upgraded to Runtime 7.2.18, they must be using Enterprise or Light Duty Data Lake.

Enterprise Data Lakes are a redefined version of Medium Duty Data Lakes that still offer failure resilience, but utilize resources and allocate memory more efficiently than a Medium Duty Data Lake at the same cost. Enterprise Data Lakes can handle more intensive workloads than Medium Duty Data Lakes and when deployed in Multi-AZ mode, remain operational during an availability zone outage.

If you are using Medium Duty Data Lake and would like to upgrade to Runtime 7.2.18, you will first need to upgrade to 7.2.17 first, and then resize your Data Lake to Enterprise Data Lake. For more information, see Upgrading from Medium Duty to Enterprise Data Lake.

Support for Amazon S3 Express One Zone buckets

Starting with Runtime 7.2.18, CDP supports using S3 Express One Zone buckets for data storage. If you have additional data buckets that you would like to use for Data Hub workloads and you do not need zone redundancy, you may use S3 Express buckets, for example for faster processing of temporary data. For more information, see Using S3 Express One Zone for data storage.

Rolling upgrade support for the Data Lake

With the release of Runtime 7.2.18, rolling upgrades for certain Data Lakes are now available. Rolling upgrades for the Data Lake are limited to certain Data Lake Runtime versions and shapes. For more information, see Rolling upgrades.

Operational Database

Cloudera Operational Database (COD) 1.41 version supports CDP CLI changes and upgrade to higher instance types for HDFS storage type and Azure deployments.

COD has updated the HDFS instance type to 16 core instances

When you create an operational database with HDFS storage type, the COD clusters now use 16 core instances on AWS, Azure, and GCP environments for worker nodes. The COD clusters with HDFS storage type are upgraded to enhance the performance and usability of the COD.

The new worker instances for HDFS storage type are as follows:

  • AWS: m5.4xlarge
  • Azure: Standard_D16_v3
  • GCP: n2-standard-16

COD supports configuring root volume size for available instances in a COD cluster

In CDP CLI, while creating an operational database, you can set the default root volume size with the –root-volume-size (integer) option in GiB for all the instances in the cluster.

Following is a sample command.

cdp opdb create-database --environment-name test-env --database-name test-db --root-volume-size 300

For more information, see CDP CLI documentation.

COD has updated the instance type for Azure deployments

When you create and deploy an operational database in an Azure environment, by default, COD clusters now use Standard_D8s_v3 instance type instead of Standard_D8_v3. The instance type is upgraded to support encryption at the host level.

If you want to retain the Standard_D8_v3 instance type, you must have the COD_USE_DV3_INSTANCE_TYPE entitlement on your account.

COD supports instance group encryption in AWS environments

In CDP CLI, while creating an operational database, you can specify the encryption key to encrypt the volume for instance groups using the --volume-encryptions (array) option. You can specify this option only in AWS environments.

Following is a sample command.

opdb create-database --environment-name <environment-name> --database-name <database-name> --disable-external-db --scale-type MICRO --attached-storage-for-workers '{"volumeCount":1,"volumeType":"SSD","volumeSize":100}' --endpoint-url http://localhost:8988 \
--volume-encryptions '[
  {
    "encryptionKey": "<aws-key-arn>",
    "instanceGroup": "GATEWAY"
  }
]'

Shorthand syntax: encryptionKey=string,instanceGroup=string ... (separate items with spaces)

JSON syntax:

[
 {
      "encryptionKey": "string",
      "instanceGroup": "WORKER"|"LEADER"|"MASTER"|"GATEWAY"|"STRONGMETA"|"EDGE"
 }
            ...
]

For more information, see CDP CLI documentation.

Replication Manager

CDP version 7.2.18 introduces the following new features in the Replication Manager service.

Register GCP credentials

You can add the GCP credentials on the CDP Public Cloud Replication Manager > Cloud Credentials page to use in Replication Manager. For more information, see Registering GCS credentials to use in Replication Manager.

HBase replication policy enhancements

HBase replication policies support the following enhancements:

  • You can create multiple HBase replication policies between multiple clusters to replicate HBase data. For more information, see Replicate HBase data simultaneously between multiple clusters.

  • You can choose to Replicate all user tables or Replicate only tables where replication is already enabled after you choose the Select Source > Replicate Database option during the HBase replication policy creation process. The Replicate only tables where replication is already enabled option is supported only if the target cluster uses Cloudera Manager version 7.12.0.0 and higher, 7.11.0-h3 and higher, or 7.9.0-h7 and higher. For more information, see Methods to replicate HBase data and Creating HBase replication policies.

  • You can enter a YARN Queue Name to submit the replication job during the HBase replication policy creation or choose to retain the “default” value. For more information, see Creating HBase replication policies.

  • You can click Actions on the Replication Policies page to Collect diagnostic bundle for the required HBase replication policy. For more information, see Monitor HBase replication policy job details.

For more information about feature support in CDP Public Cloud Replication Manager, see Supported features.