CDP Public Cloud: May 2024 Release Summary

The CDP Public Cloud Release Summary summarizes major features introduced in CDP Public Cloud Management Console, Data Hub, and data services.

Data Catalog

This release of the Data Catalog service introduces the following new features and additions:

Iceberg tables are now supported by the Data Catalog service

Iceberg tables are now supported by the Data Catalog service:

  • You can filter for them in the Search page.
  • Iceberg tables can be viewed in the Asset Details page.
  • Iceberg tables can be added to a dataset.
  • All subcomponents of Data Catalog support JDK 17.

Note: Profilers do not support Iceberg tables in this release.

Data Engineering

The 1.21 release of the Cloudera Data Engineering service on CDP Public Cloud introduces the following change:

Airflow version upgrade to 2.7

CDP has upgraded the Airflow version to 2.7. For more information, see Apache Airflow 2.7 Release Notes and Compatibility for Cloudera Data Engineering and Runtime components.

The 1.21.0-h1 release of the Cloudera Data Engineering service on CDP Public Cloud introduces the following change:

Listing ACL-based users

Listing users governed by the Access Control List (ACL) for CDE Server versions lower than 1.20.3.

In CDE 1.21, users who had clusters lower than 1.20.3 with ACL-based access control were not able to interact with the ACL. Now, the cluster version is checked and the appropriate ACL-based, or RBAC-based UI is displayed, based on the cluster version.

DataFlow

This release (2.8.0-b274) of Cloudera DataFlow (CDF) on CDP Public Cloud introduces the ability to change the flow definition version of running deployments, a new Overview page with tutorials and shortcuts, the ability to create NiFi 2.0 deployments (Technical Preview), filtering ReadyFlows by different categories, new GenAI ReadyFlows, and supports new Kubernetes versions.

Latest NiFi version

Flow Deployments and Test Sessions now support the latest Apache NiFi 1.25 release.

NiFi 2.0 Technical Preview

You can now select NiFi 2.0 (based on upstream Apache NiFi M2 release with critical fixes from M3) when creating deployments including the ability to configure your deployment to use your custom Python based processors.

Change flow definition version of existing deployments

You can now change the version of your flow definition for existing deployments. This eliminates the need to recreate deployments whenever a new version of your flow is available. Depending on your needs, you can choose from three different strategies when changing the flow definition version. Learn more about changing flow definition versions.

New Overview Page

When navigating to CDF, users now start on the new Overview page. The Overview page helps new users getting started with guides and documentation, informs administrators about recent releases and offers shortcuts to power users.

Custom NiFi Node sizing (requires entitlement)

When creating a deployment, CDF now supports specifying custom core/memory settings. This feature requires an entitlement. Reach out to your Cloudera team to request access.

The ReadyFlow Gallery now supports filtering available ReadyFlows by four different categories: Use case category, Source, Destination and compatible NiFi version.

New ReadyFlows

  • DB2 CDC to Iceberg (Technical Preview)
  • DB2 CDC to Kudu
  • MySQL CDC to Iceberg (Technical Preview)
  • Oracle CDC to Iceberg (Technical Preview)
  • PostgreSQL to Iceberg (Technical Preview)
  • Slack to Pinecone (NiFi 2.0)
  • SQL Server CDC to Iceberg (Technical Preview)

Support for new Kubernetes versions

CDF now supports Kubernetes 1.28 on EKS and AKS.

Include JVM Heap and Thread Dump in UDX

Users can trigger a Diagnostic Bundle collection to include heap and thread dump of all the NiFi nodes of a flow deployment.

Other changes and improvements

  • The Dashboard page has been renamed to Deployments and continues to be the single pane of glass to monitor all existing deployments.

  • Diagnostic Bundle collection now includes additional information and allows collection of heap dumps for faster troubleshooting.

  • Flow Details of the Catalog page now support text search and filtering by tags.

  • When enabling CDF on Azure, CDF now provisions an Azure Database for PostgreSQL instead of a single server. This ensures supportability as single servers get phased out by Microsoft.

  • CDF on AWS now supports RDS certificate rotation ensuring compatibility with planned certification rotation changes from AWS.

Removed NiFi components

In this release of CDF-PC, a number of deprecated NiFi processors and controller services have been removed from the porduct. For a list of removed components and suggested replacements, see Removed processors and Removed controller services, respectively.

Data Hub

This release of the Data Hub service introduces the following changes:

New default Azure VM instance types in cluster templates

In the built-in Data Hub templates, the following instance types are being replaced:

  • Standard_D5_v2 is being replaced with Standard_D16s_v3
  • Standard_D8_v3 is being replaced with Standard_D8s_v3
  • Standard_D16_v3 is being replaced with Standard_D16s_v3
  • Standard_E16_v3 is being replaced with Standard_E16s_v3
  • Standard_E8_v3 is being replaced with Standard_E8s_v3

The following table provides more detail:

Template name Group Previous instance type New instance type
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure manager Standard_D16_v3 Standard_D16s_v3
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure compute Standard_D5_v2 Standard_D16s_v3
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure worker Standard_D16_v3 Standard_D16s_v3
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure gateway Standard_D16_v3 Standard_D16s_v3
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure master Standard_D16_v3 Standard_D16s_v3
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure master Standard_D16_v3 Standard_D16s_v3
7.x - Data Engineering Spark3 for Azure master Standard_D16_v3 Standard_D16s_v3
7.x - Data Engineering Spark3 for Azure compute Standard_D5_v2 Standard_D16s_v3
7.x - Data Engineering Spark3 for Azure worker Standard_D5_v2 Standard_D16s_v3
7.x - Data Engineering Spark3 for Azure gateway Standard_D8_v3 Standard_D16s_v3
7.x - Data Discovery and Exploration for Spark3 for Azure master Standard_D8_v3 Standard_D8s_v3
7.x - Data Discovery and Exploration for Spark3 for Azure gateway Standard_D8_v3 Standard_D8s_v3
7.x - Data Discovery and Exploration for Spark3 for Azure leader Standard_D8_v3 Standard_D8s_v3
7.x - Data Discovery and Exploration for Spark3 for Azure worker Standard_D8_v3 Standard_D8s_v3
7.x - Data Discovery and Exploration for Spark3 for Azure yarnworker Standard_D8_v3 Standard_D8s_v3
7.x - Data Mart: Apache Impala, Hue for Azure master Standard_E8_v3 Standard_E8s_v3
7.x - Edge Flow Management Light Duty for Azure management Standard_D8_v3 Standard_D8s_v3
7.x - COD Edge Node for for Azure leader Standard_D8_v3 Standard_D8s_v3
7.x - Streaming Analytics Heavy Duty for Azure manager Standard_D8_v3 Standard_D8s_v3
7.x - Streaming Analytics Heavy Duty for Azure master Standard_D8_v3 Standard_D8s_v3
7.x - Streaming Analytics Heavy Duty for Azure worker Standard_D8_v3 Standard_D8s_v3
7.x - Streaming Analytics Light Duty for Azure manager Standard_D8_v3 Standard_D8s_v3
7.x - Streaming Analytics Light Duty for Azure master Standard_D8_v3 Standard_D8s_v3
7.x - Streaming Analytics Light Duty for Azure worker Standard_D8_v3 Standard_D8s_v3
7.x - Flow Management Light Duty for Azure management Standard_D8_v3 Standard_D8s_v3
7.x - Flow Management Light Duty for Azure nifi_scaling Standard_D8_v3 Standard_D8s_v3
7.x - Flow Management Light Duty for Azure nifi Standard_D8_v3 Standard_D8s_v3
7.x - Flow Management Heavy Duty for Azure management Standard_D8_v3 Standard_D8s_v3
7.x - Real-time Data Mart - Spark3 for Azure master1 Standard_D8_v3 Standard_D8s_v3
7.x - Real-time Data Mart - Spark3 for Azure master2 Standard_D8_v3 Standard_D8s_v3
7.x - Real-time Data Mart - Spark3 for Azure master3 Standard_D8_v3 Standard_D8s_v3
7.x - Streams Messaging High Availability for Azure manager Standard_D16_v3 Standard_D16s_v3
7.x - Streams Messaging High Availability for Azure core_zookeeper Standard_D8_v3 Standard_D8s_v3
7.x - Streams Messaging High Availability for Azure core_broker Standard_D8_v3 Standard_D8s_v3
7.x - Streams Messaging High Availability for Azure srm Standard_D8_v3 Standard_D8s_v3
7.x - Streams Messaging Light Duty for Azure broker Standard_D8_v3 Standard_D8s_v3
7.x - Streams Messaging Light Duty for Azure kraft Standard_D8_v3 Standard_D8s_v3
7.x - Streams Messaging Heavy Duty for Azure registry Standard_D8_v3 Standard_D8s_v3
7.x - Streams Messaging Heavy Duty for Azure smm Standard_D8_v3 Standard_D8s_v3
7.x - Streams Messaging Heavy Duty for Azure srm Standard_D8_v3 Standard_D8s_v3
7.x - Streams Messaging Heavy Duty for Azure connect Standard_D8_v3 Standard_D8s_v3
7.x - Streams Messaging Heavy Duty for Azure kraft Standard_D8_v3 Standard_D8s_v3

If you would like to create Data Hubs with the previously used instance types, you can achieve this by configuring a custom instance type.

Machine Learning

Cloudera Machine Learning version 2.0.45-b81 introduces the following changes and improvements:

  • The NodeSelector label can now be added for inference services. The label can be specified in the instance_type field of the deploy or update requests. This would enable you to direct inference service pods to specific nodes.
  • Enhancements to the Export API to support the Observability APIs.
  • Cloudera AI Inference Service (Technical Preview): AI Inference service is a production-grade serving environment for traditional, generative AI, and LLM models. It is designed to handle the challenges of production deployments, such as high availability, fault tolerance, and scalability. The service is now available for users to carry out inference on the following three categories of models:
    • TRT-LLMs: LLMs that are optimized to TRT engine and available in NVIDIA GPU Cloud catalog, also known as NGC catalog.
    • LLMs available through Hugging Face Hub.
    • Traditional machine learning models like classification, regression, and so on. Models need to be imported to the model registry to be served using the Cloudera AI Inference Service.

Cloudera Machine Learning version 2.0.45-b76 introduces the following changes and improvements:

  • Model Registry API (Technical Preview): New API is available from the Model Registry service to import, get, update and delete models without relying on the CML Workspace service.
  • Ephemeral storage limit: The default ephemeral storage limit for CML Projects has been increased from 10 GB to 30 GB.

Management Console

This release of the Management Console service introduces the following changes:

Updating instance metadata to IMDSv2

CDP now uses IMDSv2 for accessing EC2 instance metadata on all newly created Data Lakes, FreeIPA clusters, and Data Hubs. Previously created clusters using IMDSv1 can now be updated to IMDSv2. For more information, see Updating instance metadata to IMDSv2.

Operational Database

Cloudera Operational Database (COD) 1.42 version supports HBase REST server scaling and CDP CLI enhancements.

The HBase REST server scaling for better performance [Technical Preview]

You can scale up the HBase REST server using the Apache HBase REST API, for better connectivity to COD. You need a minimum of two Gateway nodes to utilize this functionality. The required number of Gateway nodes can be specified using the --num-gateway-nodes option in the create-database command using CDP CLI.

This feature is under technical preview. To use this feature, you must have the COD_RESTWORKERS entitlement enabled in your CDP environment.

Following is a sample command:

cdp opdb create-database --environment-name env_name --database-name database_name --num-gateway-nodes integer

For more information, see Scaling the HBase REST server in COD.

Enhancements to the describe-database command

In CDP CLI, the output of the describe-database command shows the JDK version of the COD cluster if the cluster was created using a specific JDK version; otherwise, the output shows the JDK version as “Not Available”.

The following is a sample output of the describe-database command that shows the Java version used to create the cluster.

"dbEdgeNodeCount": 0,
"scaleType": "MICRO",
"type": "COD",
"computeNodesCount": 0,
"totalComputeNodesCount": 0,
"isJwtEnabled": true,
"cloudPlatform": "AWS",
"javaVersion": "11"

For more information, see CDP CLI documentation.