CDP Public Cloud: December 2023 Release Summary

Data Engineering

This release (1.19.4) of the Cloudera Data Engineering (CDE) service on CDP Public Cloud introduces the following changes:

Kubernetes 1.26 support

CDE now supports Kubernetes 1.26 for Azure and Amazon Web Services (AWS). You can upgrade to the Kubernetes 1.26 cluster through the CDE supported upgrade path.

Amazon Relational Database Service (Amazon RDS) at rest encryption with Customer Managed Keys (CMK) (Preview)

CDE Service deployed on AWS using this CMK enabled environment, will start using CMK based data at rest encryption for RDS. For more information, see Enable Customer Managed Keys on Amazon Web Services (Preview).

AWS Kubernetes secret encryption with Customer Managed Keys (CMK) (Preview)

CDE Service deployed using this CMK enabled environment, will start using CMK based encryption for Kubernetes secrets. For more information, see Enable Customer Managed Keys on Amazon Web Services (Preview).

Amazon Elastic File System (AWS EFS) data at-rest encryption with Customer Managed Key (CMK) (Preview)

Customer Managed Key is a feature supported by AWS that give customers ownership of their encryption keys. For more information, see For more information, see Enable Customer Managed Keys on Amazon Web Services (Preview).

Amazon Elastic File System (AWS EFS) data in-transit encryption

Support for data in-transit encryption through EFS CSI Driver. The EFS data read/write over the wire are encrypted by TLS.

Amazon Elastic File System (AWS EFS) Anonymous Access restriction

This feature includes security hardening by preventing anonymous user or machines from accessing EFS and its access points.

Data Warehouse

  • Upgrade your Virtual Warehouse to get updated CDW Runtime 2023.0.16.2-1 (released December 19, 2023), which includes fixed issues.
  • CDW Runtime 2023.0.16.1-2, which included fixed issues, was previously released on December 1, 2023.

DataFlow

This release (2.6.1-h1-b1) of Cloudera DataFlow (CDF) on CDP Public Cloud introduces fixed issues and no new features.

Machine Learning

Version 2.0.41-b238 includes fixed issues and no new features.

Management Console

This release of the Management Console service introduces the following changes:

VHD images hosted in Azure Marketplace

Cloudera now publishes VHD images on Azure Marketplace for each minor Runtime release (for example, 7.2.17) and CDP uses these images by default during environment and Data Hub creation. In order for CDP to be able to load these images, customers need to accept Azure Marketplace terms and conditions either via CDP web UI or Azure CLI. Note that RHEL 8 images on Azure are only available via Azure Marketplace.

In order to use this feature, you need to Grant the service principal additional Azure permissions:

On the scope of your Azure subscription:

“Microsoft.MarketplaceOrdering/offertypes/publishers/offers/plans/agreements/write”, “Microsoft.MarketplaceOrdering/offerTypes/publishers/offers/plans/agreements/read”

On the scope of the CDP Azure resource group:

“Microsoft.Resources/deployments/whatIf/action”

For more information, see CDP images hosted in Azure Marketplace.

Additional permissions needed in the cross-account role

You should update the AWS cross-account role to include ec2:DescribeLaunchTemplateVersions policy action to allow CDP update AWS Launch Template UserData. CDP Public Cloud cluster operations such as Cluster Connectivity Manager version upgrades, proxy, and Salt credential management require updating the AWS Launch Template UserData. Without this capability, node repair, upgrade, and scale operations performed on the CDP Public Cloud environment, Data Lake (SDX), and Data Hub may fail.

Operational Database

Cloudera Operational Database (COD) 1.37 version supports modifications to the entitlements and Multi-AZ deployment on an Azure environment.

COD has enabled the COD_ON_GCS entitlement

You can deploy COD on a Google Cloud Platform (GCP) by using Google Cloud Storage (GCS) similar to what is available for Amazon Web Services (AWS) S3 storage and Microsoft Azure blob storage. Now, COD has enabled the COD_ON_GCS entitlement, by default, for such a deployment.

COD has removed the COD_EDGE_NODE entitlement

COD has removed the COD_EDGE_NODE entitlement now because it is not needed anymore. COD edge node functionality is enabled for all COD customers now.

COD has removed the COD_STOREFILE_TRACKING entitlement

COD has removed the COD_STOREFILE_TRACKING entitlement because it is not needed anymore. The Store File Tracking (SFT) functionality is enabled on all new COD clusters created with cloud storage.

COD has removed the OPDB_USE_EPHEMERAL_STORAGE entitlement

COD has removed the OPDB_USE_EPHEMERAL_STORAGE entitlement because it is not needed anymore. The use of COD on a cloud storage with ephemeral cache is enabled without an entitlement depending on the cluster creation parameters.

COD supports Multiple Availability Zones (Multi-AZ) on Azure (Preview)

COD ensures high availability and fault tolerance using Multi-AZ deployments. A Multi-AZ deployment means that compute infrastructure for HBase’s master and region servers are distributed across multiple AZs ensuring that when a single availability zone has an outage, only a portion of Region Servers is impacted and clients automatically switch over to the remaining servers in the available AZs.

Multi-AZ for COD is now supported on Microsoft Azure environments as a technical preview and is considered under development. For more information, see Multi-AZ deployment on COD.

COD supports a new instance type I4i for Cloud With Ephemeral Storage type databases on AWS environments

When you create a new operational database with Cloud With Ephemeral Storage as the storage type on an AWS environment, COD creates the database with an I4i instance type for the worker nodes.

COD supports fast autoscaling for higher computing requirements

When you create a new operational database using CDP CLI, you can enable fast autoscaling by defining the required parameters using the --auto-scaling-parameters option. COD now supports a new instance group called, Compute. The nodes under this instance group are automatically scaled up or scaled down based on the CPU utilization and RPC latency.

To use fast autoscaling, you must have the COD_USE_COMPUTE_ONLY_NODES entitlement.

Following is a sample command.

cdp opdb create-database —environment-name <env_name> –database-name <db_name> --auto-scaling-parameters '{"minComputeNodesForDatabase":<min_compute_nodes>, "maxComputeNodesForDatabase": <max_compute_nodes>}'

For more information, see The fast autoscaling in COD.