CDP Public Cloud: May 2024 Release Summary
The CDP Public Cloud Release Summary summarizes major features introduced in CDP Public Cloud Management Console, Data Hub, and data services.
Data Catalog
This release of the Data Catalog service introduces the following new features and additions:
Iceberg tables are now supported by the Data Catalog service
Iceberg tables are now supported by the Data Catalog service:
- You can filter for them in the Search page.
- Iceberg tables can be viewed in the Asset Details page.
- Iceberg tables can be added to a dataset.
- All subcomponents of Data Catalog support JDK 17.
Note: Profilers do not support Iceberg tables in this release.
Data Engineering
The 1.21 release of the Cloudera Data Engineering service on CDP Public Cloud introduces the following change:
Airflow version upgrade to 2.7
CDP has upgraded the Airflow version to 2.7. For more information, see Apache Airflow 2.7 Release Notes and Compatibility for Cloudera Data Engineering and Runtime components.
The 1.21.0-h1 release of the Cloudera Data Engineering service on CDP Public Cloud introduces the following change:
Listing ACL-based users
Listing users governed by the Access Control List (ACL) for CDE Server versions lower than 1.20.3.
In CDE 1.21, users who had clusters lower than 1.20.3 with ACL-based access control were not able to interact with the ACL. Now, the cluster version is checked and the appropriate ACL-based, or RBAC-based UI is displayed, based on the cluster version.
DataFlow
This release (2.8.0-b274) of Cloudera DataFlow (CDF) on CDP Public Cloud introduces the ability to change the flow definition version of running deployments, a new Overview page with tutorials and shortcuts, the ability to create NiFi 2.0 deployments (Technical Preview), filtering ReadyFlows by different categories, new GenAI ReadyFlows, and supports new Kubernetes versions.
Latest NiFi version
Flow Deployments and Test Sessions now support the latest Apache NiFi 1.25 release.
NiFi 2.0 Technical Preview
You can now select NiFi 2.0 (based on upstream Apache NiFi M2 release with critical fixes from M3) when creating deployments including the ability to configure your deployment to use your custom Python based processors.
Change flow definition version of existing deployments
You can now change the version of your flow definition for existing deployments. This eliminates the need to recreate deployments whenever a new version of your flow is available. Depending on your needs, you can choose from three different strategies when changing the flow definition version. Learn more about changing flow definition versions.
New Overview Page
When navigating to CDF, users now start on the new Overview page. The Overview page helps new users getting started with guides and documentation, informs administrators about recent releases and offers shortcuts to power users.
Custom NiFi Node sizing (requires entitlement)
When creating a deployment, CDF now supports specifying custom core/memory settings. This feature requires an entitlement. Reach out to your Cloudera team to request access.
ReadyFlow Gallery filtering
The ReadyFlow Gallery now supports filtering available ReadyFlows by four different categories: Use case category, Source, Destination and compatible NiFi version.
New ReadyFlows
- DB2 CDC to Iceberg (Technical Preview)
- DB2 CDC to Kudu
- MySQL CDC to Iceberg (Technical Preview)
- Oracle CDC to Iceberg (Technical Preview)
- PostgreSQL to Iceberg (Technical Preview)
- Slack to Pinecone (NiFi 2.0)
- SQL Server CDC to Iceberg (Technical Preview)
Support for new Kubernetes versions
CDF now supports Kubernetes 1.28 on EKS and AKS.
Include JVM Heap and Thread Dump in UDX
Users can trigger a Diagnostic Bundle collection to include heap and thread dump of all the NiFi nodes of a flow deployment.
Other changes and improvements
-
The Dashboard page has been renamed to Deployments and continues to be the single pane of glass to monitor all existing deployments.
-
Diagnostic Bundle collection now includes additional information and allows collection of heap dumps for faster troubleshooting.
-
Flow Details of the Catalog page now support text search and filtering by tags.
-
When enabling CDF on Azure, CDF now provisions an Azure Database for PostgreSQL instead of a single server. This ensures supportability as single servers get phased out by Microsoft.
-
CDF on AWS now supports RDS certificate rotation ensuring compatibility with planned certification rotation changes from AWS.
Removed NiFi components
In this release of CDF-PC, a number of deprecated NiFi processors and controller services have been removed from the porduct. For a list of removed components and suggested replacements, see Removed processors and Removed controller services, respectively.
Data Hub
This release of the Data Hub service introduces the following changes:
New default Azure VM instance types in cluster templates
In the built-in Data Hub templates, the following instance types are being replaced:
- Standard_D5_v2 is being replaced with Standard_D16s_v3
- Standard_D8_v3 is being replaced with Standard_D8s_v3
- Standard_D16_v3 is being replaced with Standard_D16s_v3
- Standard_E16_v3 is being replaced with Standard_E16s_v3
- Standard_E8_v3 is being replaced with Standard_E8s_v3
The following table provides more detail:
Template name | Group | Previous instance type | New instance type |
---|---|---|---|
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure | manager | Standard_D16_v3 | Standard_D16s_v3 |
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure | compute | Standard_D5_v2 | Standard_D16s_v3 |
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure | worker | Standard_D16_v3 | Standard_D16s_v3 |
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure | gateway | Standard_D16_v3 | Standard_D16s_v3 |
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure | master | Standard_D16_v3 | Standard_D16s_v3 |
7.x - Data Engineering: HA: Apache Spark3, Apache Hive, Apache Oozie for Azure | master | Standard_D16_v3 | Standard_D16s_v3 |
7.x - Data Engineering Spark3 for Azure | master | Standard_D16_v3 | Standard_D16s_v3 |
7.x - Data Engineering Spark3 for Azure | compute | Standard_D5_v2 | Standard_D16s_v3 |
7.x - Data Engineering Spark3 for Azure | worker | Standard_D5_v2 | Standard_D16s_v3 |
7.x - Data Engineering Spark3 for Azure | gateway | Standard_D8_v3 | Standard_D16s_v3 |
7.x - Data Discovery and Exploration for Spark3 for Azure | master | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Data Discovery and Exploration for Spark3 for Azure | gateway | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Data Discovery and Exploration for Spark3 for Azure | leader | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Data Discovery and Exploration for Spark3 for Azure | worker | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Data Discovery and Exploration for Spark3 for Azure | yarnworker | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Data Mart: Apache Impala, Hue for Azure | master | Standard_E8_v3 | Standard_E8s_v3 |
7.x - Edge Flow Management Light Duty for Azure | management | Standard_D8_v3 | Standard_D8s_v3 |
7.x - COD Edge Node for for Azure | leader | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streaming Analytics Heavy Duty for Azure | manager | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streaming Analytics Heavy Duty for Azure | master | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streaming Analytics Heavy Duty for Azure | worker | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streaming Analytics Light Duty for Azure | manager | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streaming Analytics Light Duty for Azure | master | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streaming Analytics Light Duty for Azure | worker | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Flow Management Light Duty for Azure | management | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Flow Management Light Duty for Azure | nifi_scaling | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Flow Management Light Duty for Azure | nifi | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Flow Management Heavy Duty for Azure | management | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Real-time Data Mart - Spark3 for Azure | master1 | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Real-time Data Mart - Spark3 for Azure | master2 | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Real-time Data Mart - Spark3 for Azure | master3 | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streams Messaging High Availability for Azure | manager | Standard_D16_v3 | Standard_D16s_v3 |
7.x - Streams Messaging High Availability for Azure | core_zookeeper | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streams Messaging High Availability for Azure | core_broker | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streams Messaging High Availability for Azure | srm | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streams Messaging Light Duty for Azure | broker | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streams Messaging Light Duty for Azure | kraft | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streams Messaging Heavy Duty for Azure | registry | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streams Messaging Heavy Duty for Azure | smm | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streams Messaging Heavy Duty for Azure | srm | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streams Messaging Heavy Duty for Azure | connect | Standard_D8_v3 | Standard_D8s_v3 |
7.x - Streams Messaging Heavy Duty for Azure | kraft | Standard_D8_v3 | Standard_D8s_v3 |
If you would like to create Data Hubs with the previously used instance types, you can achieve this by configuring a custom instance type.
Machine Learning
Cloudera Machine Learning version 2.0.45-b81 introduces the following changes and improvements:
- The NodeSelector label can now be added for inference services. The label can be specified in the instance_type field of the deploy or update requests. This would enable you to direct inference service pods to specific nodes.
- Enhancements to the Export API to support the Observability APIs.
- Cloudera AI Inference Service (Technical Preview): AI Inference service is a production-grade serving environment for traditional, generative AI, and LLM models. It is designed to handle the challenges of production deployments, such as high availability, fault tolerance, and scalability. The service is now available for users to carry out inference on the following three categories of models:
- TRT-LLMs: LLMs that are optimized to TRT engine and available in NVIDIA GPU Cloud catalog, also known as NGC catalog.
- LLMs available through Hugging Face Hub.
- Traditional machine learning models like classification, regression, and so on. Models need to be imported to the model registry to be served using the Cloudera AI Inference Service.
Cloudera Machine Learning version 2.0.45-b76 introduces the following changes and improvements:
- Model Registry API (Technical Preview): New API is available from the Model Registry service to import, get, update and delete models without relying on the CML Workspace service.
- Ephemeral storage limit: The default ephemeral storage limit for CML Projects has been increased from 10 GB to 30 GB.
Management Console
This release of the Management Console service introduces the following changes:
Updating instance metadata to IMDSv2
CDP now uses IMDSv2 for accessing EC2 instance metadata on all newly created Data Lakes, FreeIPA clusters, and Data Hubs. Previously created clusters using IMDSv1 can now be updated to IMDSv2. For more information, see Updating instance metadata to IMDSv2.
Operational Database
Cloudera Operational Database (COD) 1.42 version supports HBase REST server scaling and CDP CLI enhancements.
The HBase REST server scaling for better performance [Technical Preview]
You can scale up the HBase REST server using the Apache HBase REST API, for better connectivity to COD. You need a minimum of two Gateway nodes to utilize this functionality. The required number of Gateway nodes can be specified using the --num-gateway-nodes
option in the create-database
command using CDP CLI.
This feature is under technical preview. To use this feature, you must have the COD_RESTWORKERS
entitlement enabled in your CDP environment.
Following is a sample command:
cdp opdb create-database --environment-name env_name --database-name database_name --num-gateway-nodes integer
For more information, see Scaling the HBase REST server in COD.
Enhancements to the describe-database command
In CDP CLI, the output of the describe-database
command shows the JDK version of the COD cluster if the cluster was created using a specific JDK version; otherwise, the output shows the JDK version as “Not Available”.
The following is a sample output of the describe-database command that shows the Java version used to create the cluster.
"dbEdgeNodeCount": 0,
"scaleType": "MICRO",
"type": "COD",
"computeNodesCount": 0,
"totalComputeNodesCount": 0,
"isJwtEnabled": true,
"cloudPlatform": "AWS",
"javaVersion": "11"
For more information, see CDP CLI documentation.