CDP Public Cloud: January 2024 Release Summary

DataFlow

This release (2.7.0-b190) of Cloudera DataFlow (CDF) on CDP Public Cloud introduces new Change Data Capture processors, flow version tagging, deployment configuration reuse, NiFi bulletin monitoring and supports new Kubernetes versions.

Latest NiFi version

Flow Deployments and Test Sessions now support the latest Apache NiFi 1.24 release.

New Debezium CDC processors

You can now build and deploy flows with Debezium based CDC processors for MySQL, Postgres, Oracle, SQLServer and DB2 databases.

Deployment alerts for NiFi bulletins

CDF now detects bulletin error messages in NiFi flow deployments and displays them as a warning in the dashboard.

Flow Definition versions can now be tagged

You can now use tags to identify versions easier. Common use cases for tags are applying a custom versioning scheme, or labeling flow versions as ‘ready for deployment’. The CLI introduces new commands to search flow definitions for tags and streamlines the CI/CD process.

Deployment configurations can now be exported and reused

Customers can now export a configuration archive of existing deployments and reuse them when creating new deployments. This speeds up redeploying new versions of the same flow through the Deployment Wizard by filling in parameter values, KPI configurations etc. from the deployment configuration.

Support for new Kubernetes versions

AKS 1.27 / EKS 1.27

New ReadyFlows

  • MySQL CDC to Kudu (Tech Preview)
  • Postgres CDC to Kudu (Tech Preview)
  • Oracle CDC to Kudu (Tech Preview)
  • SQL Server CDC to Kudu (Tech Preview)
  • ADLS to Databricks
  • S3 to Databricks
  • HuggingFace dataset to S3/ADLS
  • S3 to IBM watsonx

Changes and improvements

  • The HelloWorld ReadyFlow has been updated. You need to deploy Version 2 to run it with NiFi 1.24.0 or higher.
  • Underlying AKS creation error during a CDF enablement, if available, is now reported together with the enable failed status event instead of showing up a few minutes later.
  • You can now download Client Certificate and Private Key for an Inbound Connection in a single action.

Data Warehouse

Postgres 11 end of life

AWS announced the Amazon Relational Database (RDS) Postgres 11 end of life is Feb 29, 2024. In Cloudera Data Platform, in an AWS environment, any new CDW created in this release 1.8.3-b130 Jan 10, 2024 or later will support Postgres 13 on RDS.

Any existing CDW in an AWS environments that uses Postgres 11 requires backup and restoration. You backup CDW in an AWS environment that supports Postgres 11 on RDS, and then restore CDW on an environment that supports Postgres 13 on RDS as described in Backing up and restoring CDW.

Machine Learning

Cloudera Machine Learning introduces the following new features and improvements in version 2.0.43-b208:

  • Cloudera Data Warehouse - Automatic JWT-based authentication enables passwordless connectivity to CDW. Users do not need to use their workload password to query data from CML. This feature depends on Data Lake 7.2.18, please upgrade your environment when the new version is available.
  • Redesigned AMP Catalog - The AMPs pane is redesigned to improve navigation and search capabilities.
  • HuggingFace Spaces - A curated list of HuggingFace Spaces is available in the AMPs Catalog.
  • Community AMPs - A selected list of community-created AMPs is available to run in CML in the AMPs pane.
  • Azure - Support for new GPU instances: NVadsA10 v5-series (non-fractional)
  • Azure - Certificate based authentication using Managed Identity to provision in AKS.
  • AWS - Support for new GPU instances: g5
  • AWS - Deprecated support for P2 instance types.
  • AWS - Added support for CML workspaces in af-south-1, Africa (Cape Town) region.
  • Kubernetes - Kubernetes version 1.27 is supported on both Azure and AWS.
  • Restore workflow - Improved reliability of the workspace restore workflow.
  • Private DNS Zone - CML now certified to work with private DNS zones.
  • Project Migration tool - A command line argument is added to check if source and destination files are the same, covering job, app, model, project data and metadata files.
  • Runtimes - The R version of cmladdon is upgraded to version 4.3.1.
  • Runtimes - The HadoopCLI 7.2.17.100 Runtime Addon is released for the Public Cloud.
  • Security - When adding project collaborators or team members, non-admins can be prevented from seeing the entire user list. This functionality can be restricted to Site Admins in Site Administration > Security by selecting Allow all authenticated users to access /api/v1/users endpoint.

Management Console

Deploying CDP in multiple Azure availability zones

You can now deploy your CDP environment, enterprise Data Lake and Data Hubs on Azure across multiple availability zones. This is an optional configuration that is not enabled by default. For more information, see Deploying CDP in multiple Azure availability zones.

Azure Database for PostgreSQL Flexible Server

In this release, CDP introduces Azure Database for PostgreSQL Flexible Server. New Azure environments automatically use the Flexible Server with public endpoints but as a best practice for production you should configure Private Flexible Server setup.

With the release of this feature, you must add the following permissions on the scope of the single resource group to your custom role:

"Microsoft.DBforPostgreSQL/flexibleServers/read",
"Microsoft.DBforPostgreSQL/flexibleServers/write",
"Microsoft.DBforPostgreSQL/flexibleServers/delete",
"Microsoft.DBforPostgreSQL/flexibleServers/start/action",
"Microsoft.DBforPostgreSQL/flexibleServers/stop/action",
"Microsoft.DBforPostgreSQL/flexibleServers/firewallRules/write"

For more information, see Using Azure Database for PostgreSQL Flexible Server.

Support for Azure Qatar Central region

CDP now supports launching environments and Data Hubs in Azure Qatar Central region. See Supported Azure regions.

Replication Manager

This release of the Replication Manager service introduces the following new features.

Replicate HBase tables in a database depending on the replication scope

During the HBase replication policy creation process, after you select the Select Source > Replicate Database, you can choose the Select Source > Replicate all user tables option to replicate all the HBase tables in the database, or choose the Select Source > Replicate only tables where replication is already enabled option to replicate the tables for which the replication scope is already set to 1. This provides a choice to replicate the required tables in a database.

The Replicate only tables where replication is already enabled option is supported only if the target cluster CDP version is CDP version 7.2.16.500 using Cloudera Manager 7.9.0-h7 or higher versions.

For more information, see Methods to replicate HBase data and Creating HBase replication policies.

Replicate HBase data between multiple clusters simultaneously

Starting from CDP Public Cloud version 7.2.16.500 and 7.2.17.200, you can create multiple HBase replication policies between multiple clusters to replicate HBase data.

For more information, see Replicate HBase data simultaneously between multiple clusters.