CDP Public Cloud: May 2024 Release Summary

The CDP Public Cloud Release Summary summarizes major features introduced in CDP Public Cloud Management Console, Data Hub, and data services.

Data Catalog🔗

This release of the Data Catalog service introduces the following new features and additions:

Iceberg tables are now supported by the Data Catalog service🔗

Iceberg tables are now supported by the Data Catalog service:

You can filter for them in the Search page.
Iceberg tables can be viewed in the Asset Details page.
Iceberg tables can be added to a dataset.
All subcomponents of Data Catalog support JDK 17.

Note: Profilers do not support Iceberg tables in this release.

Data Engineering🔗

The 1.21 release of the Cloudera Data Engineering service on CDP Public Cloud introduces the following change:

Airflow version upgrade to 2.7🔗

CDP has upgraded the Airflow version to 2.7. For more information, see Apache Airflow 2.7 Release Notes and Compatibility for Cloudera Data Engineering and Runtime components.

The 1.21.0-h1 release of the Cloudera Data Engineering service on CDP Public Cloud introduces the following change:

Listing ACL-based users🔗

Listing users governed by the Access Control List (ACL) for CDE Server versions lower than 1.20.3.

In CDE 1.21, users who had clusters lower than 1.20.3 with ACL-based access control were not able to interact with the ACL. Now, the cluster version is checked and the appropriate ACL-based, or RBAC-based UI is displayed, based on the cluster version.

DataFlow🔗

This release (2.8.0-b274) of Cloudera DataFlow (CDF) on CDP Public Cloud introduces the ability to change the flow definition version of running deployments, a new Overview page with tutorials and shortcuts, the ability to create NiFi 2.0 deployments (Technical Preview), filtering ReadyFlows by different categories, new GenAI ReadyFlows, and supports new Kubernetes versions.

Latest NiFi version🔗

Flow Deployments and Test Sessions now support the latest Apache NiFi 1.25 release.

NiFi 2.0 Technical Preview🔗

You can now select NiFi 2.0 (based on upstream Apache NiFi M2 release with critical fixes from M3) when creating deployments including the ability to configure your deployment to use your custom Python based processors.

Change flow definition version of existing deployments🔗

You can now change the version of your flow definition for existing deployments. This eliminates the need to recreate deployments whenever a new version of your flow is available. Depending on your needs, you can choose from three different strategies when changing the flow definition version. Learn more about changing flow definition versions.

New Overview Page🔗

When navigating to CDF, users now start on the new Overview page. The Overview page helps new users getting started with guides and documentation, informs administrators about recent releases and offers shortcuts to power users.

Custom NiFi Node sizing (requires entitlement)🔗

When creating a deployment, CDF now supports specifying custom core/memory settings. This feature requires an entitlement. Reach out to your Cloudera team to request access.

ReadyFlow Gallery filtering🔗

The ReadyFlow Gallery now supports filtering available ReadyFlows by four different categories: Use case category, Source, Destination and compatible NiFi version.

New ReadyFlows🔗

DB2 CDC to Iceberg (Technical Preview)
DB2 CDC to Kudu
MySQL CDC to Iceberg (Technical Preview)
Oracle CDC to Iceberg (Technical Preview)
PostgreSQL to Iceberg (Technical Preview)
Slack to Pinecone (NiFi 2.0)
SQL Server CDC to Iceberg (Technical Preview)

Support for new Kubernetes versions🔗

CDF now supports Kubernetes 1.28 on EKS and AKS.

Include JVM Heap and Thread Dump in UDX🔗

Users can trigger a Diagnostic Bundle collection to include heap and thread dump of all the NiFi nodes of a flow deployment.

Other changes and improvements🔗

The Dashboard page has been renamed to Deployments and continues to be the single pane of glass to monitor all existing deployments.
Diagnostic Bundle collection now includes additional information and allows collection of heap dumps for faster troubleshooting.
Flow Details of the Catalog page now support text search and filtering by tags.
When enabling CDF on Azure, CDF now provisions an Azure Database for PostgreSQL instead of a single server. This ensures supportability as single servers get phased out by Microsoft.
CDF on AWS now supports RDS certificate rotation ensuring compatibility with planned certification rotation changes from AWS.

Removed NiFi components🔗

In this release of CDF-PC, a number of deprecated NiFi processors and controller services have been removed from the porduct. For a list of removed components and suggested replacements, see Removed processors and Removed controller services, respectively.

Data Hub🔗

This release of the Data Hub service introduces the following changes:

New default Azure VM instance types in cluster templates🔗

In the built-in Data Hub templates, the following instance types are being replaced:

Standard_D5_v2 is being replaced with Standard_D16s_v3
Standard_D8_v3 is being replaced with Standard_D8s_v3
Standard_D16_v3 is being replaced with Standard_D16s_v3
Standard_E16_v3 is being replaced with Standard_E16s_v3
Standard_E8_v3 is being replaced with Standard_E8s_v3

The following table provides more detail:

Template name	Group	Previous instance type	New instance type
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure	manager	Standard_D16_v3	Standard_D16s_v3
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure	compute	Standard_D5_v2	Standard_D16s_v3
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure	worker	Standard_D16_v3	Standard_D16s_v3
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure	gateway	Standard_D16_v3	Standard_D16s_v3
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure	master	Standard_D16_v3	Standard_D16s_v3
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure	master	Standard_D16_v3	Standard_D16s_v3
7.x - Data Engineering Spark3 for Azure	master	Standard_D16_v3	Standard_D16s_v3
7.x - Data Engineering Spark3 for Azure	compute	Standard_D5_v2	Standard_D16s_v3
7.x - Data Engineering Spark3 for Azure	worker	Standard_D5_v2	Standard_D16s_v3
7.x - Data Engineering Spark3 for Azure	gateway	Standard_D8_v3	Standard_D16s_v3
7.x - Data Discovery and Exploration for Spark3 for Azure	master	Standard_D8_v3	Standard_D8s_v3
7.x - Data Discovery and Exploration for Spark3 for Azure	gateway	Standard_D8_v3	Standard_D8s_v3
7.x - Data Discovery and Exploration for Spark3 for Azure	leader	Standard_D8_v3	Standard_D8s_v3
7.x - Data Discovery and Exploration for Spark3 for Azure	worker	Standard_D8_v3	Standard_D8s_v3
7.x - Data Discovery and Exploration for Spark3 for Azure	yarnworker	Standard_D8_v3	Standard_D8s_v3
7.x - Data Mart: Apache Impala, Hue for Azure	master	Standard_E8_v3	Standard_E8s_v3
7.x - Edge Flow Management Light Duty for Azure	management	Standard_D8_v3	Standard_D8s_v3
7.x - COD Edge Node for for Azure	leader	Standard_D8_v3	Standard_D8s_v3
7.x - Streaming Analytics Heavy Duty for Azure	manager	Standard_D8_v3	Standard_D8s_v3
7.x - Streaming Analytics Heavy Duty for Azure	master	Standard_D8_v3	Standard_D8s_v3
7.x - Streaming Analytics Heavy Duty for Azure	worker	Standard_D8_v3	Standard_D8s_v3
7.x - Streaming Analytics Light Duty for Azure	manager	Standard_D8_v3	Standard_D8s_v3
7.x - Streaming Analytics Light Duty for Azure	master	Standard_D8_v3	Standard_D8s_v3
7.x - Streaming Analytics Light Duty for Azure	worker	Standard_D8_v3	Standard_D8s_v3
7.x - Flow Management Light Duty for Azure	management	Standard_D8_v3	Standard_D8s_v3
7.x - Flow Management Light Duty for Azure	nifi_scaling	Standard_D8_v3	Standard_D8s_v3
7.x - Flow Management Light Duty for Azure	nifi	Standard_D8_v3	Standard_D8s_v3
7.x - Flow Management Heavy Duty for Azure	management	Standard_D8_v3	Standard_D8s_v3
7.x - Real-time Data Mart - Spark3 for Azure	master1	Standard_D8_v3	Standard_D8s_v3
7.x - Real-time Data Mart - Spark3 for Azure	master2	Standard_D8_v3	Standard_D8s_v3
7.x - Real-time Data Mart - Spark3 for Azure	master3	Standard_D8_v3	Standard_D8s_v3
7.x - Streams Messaging High Availability for Azure	manager	Standard_D16_v3	Standard_D16s_v3
7.x - Streams Messaging High Availability for Azure	core_zookeeper	Standard_D8_v3	Standard_D8s_v3
7.x - Streams Messaging High Availability for Azure	core_broker	Standard_D8_v3	Standard_D8s_v3
7.x - Streams Messaging High Availability for Azure	srm	Standard_D8_v3	Standard_D8s_v3
7.x - Streams Messaging Light Duty for Azure	broker	Standard_D8_v3	Standard_D8s_v3
7.x - Streams Messaging Light Duty for Azure	kraft	Standard_D8_v3	Standard_D8s_v3
7.x - Streams Messaging Heavy Duty for Azure	registry	Standard_D8_v3	Standard_D8s_v3
7.x - Streams Messaging Heavy Duty for Azure	smm	Standard_D8_v3	Standard_D8s_v3
7.x - Streams Messaging Heavy Duty for Azure	srm	Standard_D8_v3	Standard_D8s_v3
7.x - Streams Messaging Heavy Duty for Azure	connect	Standard_D8_v3	Standard_D8s_v3
7.x - Streams Messaging Heavy Duty for Azure	kraft	Standard_D8_v3	Standard_D8s_v3

If you would like to create Data Hubs with the previously used instance types, you can achieve this by configuring a custom instance type.

Machine Learning🔗

Cloudera Machine Learning version 2.0.45-b81 introduces the following changes and improvements:

The NodeSelector label can now be added for inference services. The label can be specified in the instance_type field of the deploy or update requests. This would enable you to direct inference service pods to specific nodes.
Enhancements to the Export API to support the Observability APIs.
Cloudera AI Inference Service (Technical Preview): AI Inference service is a production-grade serving environment for traditional, generative AI, and LLM models. It is designed to handle the challenges of production deployments, such as high availability, fault tolerance, and scalability. The service is now available for users to carry out inference on the following three categories of models:
- TRT-LLMs: LLMs that are optimized to TRT engine and available in NVIDIA GPU Cloud catalog, also known as NGC catalog.
- LLMs available through Hugging Face Hub.
- Traditional machine learning models like classification, regression, and so on. Models need to be imported to the model registry to be served using the Cloudera AI Inference Service.

Cloudera Machine Learning version 2.0.45-b76 introduces the following changes and improvements:

Model Registry API (Technical Preview): New API is available from the Model Registry service to import, get, update and delete models without relying on the CML Workspace service.
Ephemeral storage limit: The default ephemeral storage limit for CML Projects has been increased from 10 GB to 30 GB.

Management Console🔗

This release of the Management Console service introduces the following changes:

Updating instance metadata to IMDSv2🔗

CDP now uses IMDSv2 for accessing EC2 instance metadata on all newly created Data Lakes, FreeIPA clusters, and Data Hubs. Previously created clusters using IMDSv1 can now be updated to IMDSv2. For more information, see Updating instance metadata to IMDSv2.

Operational Database🔗

Cloudera Operational Database (COD) 1.42 version supports HBase REST server scaling and CDP CLI enhancements.

The HBase REST server scaling for better performance [Technical Preview]🔗

You can scale up the HBase REST server using the Apache HBase REST API, for better connectivity to COD. You need a minimum of two Gateway nodes to utilize this functionality. The required number of Gateway nodes can be specified using the --num-gateway-nodes option in the create-database command using CDP CLI.

This feature is under technical preview. To use this feature, you must have the COD_RESTWORKERS entitlement enabled in your CDP environment.

Following is a sample command:

cdp opdb create-database --environment-name env_name --database-name database_name --num-gateway-nodes integer

For more information, see Scaling the HBase REST server in COD.

Enhancements to the describe-database command🔗

In CDP CLI, the output of the describe-database command shows the JDK version of the COD cluster if the cluster was created using a specific JDK version; otherwise, the output shows the JDK version as “Not Available”.

The following is a sample output of the describe-database command that shows the Java version used to create the cluster.

"dbEdgeNodeCount": 0,
"scaleType": "MICRO",
"type": "COD",
"computeNodesCount": 0,
"totalComputeNodesCount": 0,
"isJwtEnabled": true,
"cloudPlatform": "AWS",
"javaVersion": "11"

For more information, see CDP CLI documentation.