Release Summaries

CDP Public Cloud: February 2024 Release Summary

This release of the Data Catalog service provides you with a notable behavior change which you must note and act accordingly.

While upgrading your cluster from Cloudera Runtime version 7.2.17 to 7.2.18, and specifically during the OS upgrade step, the cluster goes into the failure state. The following message is seen:

__NODE_FAILURE:

New node(s) could not be added to the cluster. Reason Please find more details on Cloudera Manager UI. Failed command(s): Start(id=1546339088): Failed to start role profc6cf3856-PROFILER_SCHEDULER_AGENT-484032cb8f17cacf9e684efe50 of service profiler_scheduler in cluster cdp-dc-profilers-258395ef._

Impact on Data Catalog profilers:

If the Data Hub is not created, then the Data Catalog profilers will not be created in Cloudera Runtime 7.2.18 version.

To overcome this scenario, you must use the following process to bring up the Data Catalog profilers in the Cloudera Runtime 7.2.18 version.

First you must delete your existing 7.2.17 clusters. For more information, see Deleting profiler cluster.

Next, after you upgrade to the 7.2.18 Data Lake, then you can launch the Data Catalog profilers. For more information, see Launch profiler cluster.

Note: There is no data loss expected on the users’ side or the Profiler analysis. However, the only loss that could be expected is related to the last runtime value of the profiler and the profiler run history. The Profiler Last Runtime history refers to the records of how many runs of the profiler are displayed on the history page. It includes information on whether the runs were completed successfully or resulted in failures.

This release (1.20.3) of the Cloudera Data Engineering (CDE) service on CDP Public Cloud introduces the following changes.

CDE Sessions is now GA as a default feature. Sessions is an interactive short-lived development environment for running Spark commands to help you iterate upon and build your Spark workloads. The Interaction tab was added so that you can run Java, Impala, and PySpark code in blocks to develop applications. Cloudera currently supports Sessions in the CDE CLI and UI. The Spark UI tab was also added to view active sessions. For more information, see Creating and Managing CDE Sessions and Managing Sessions in CDE using the CLI.

CDE now has a revamped landing page with a new design that focuses on a more simplified workflow: Develop, Deploy, and Monitor.

CDE supports upgrades from two CDE versions 1.19.2 and above for AWS and 1.19.4 and above for Azure. Users will need to manually pause, backup, and restore each Virtual Cluster to account for upgrade failures. A way to handle upgrade failures has also been created. In-place upgrade also includes the following:

  • Upgrades of CDE core components include: EKS, AKS Services, and Application Services

  • Upgrades of dependencies include: Helm, K8s versions, YuniKorn

For more information, see Upgrading CDE and Handling upgrade failures in CDE.

You can now use Git repositories to collaborate, manage project artifacts, and promote applications from lower to higher environments. Cloudera currently supports Git providers such as GitHub, GitLab, and Bitbucket. Repository files can be accessed when you create a Spark or Airflow job. You can then deploy the job and use CDE’s centralized monitoring and troubleshooting capabilities to tune and adjust your workloads. For more information, see Creating a Git repository in CDE (Technical Preview).

CDE supports 3rd party python-based plugins and libraries to build custom Airflow pipelines using the CDE UI and API. For more information, see Using custom operators and libraries for Apache Airflow and Using custom operators and libraries for Apache Airflow using API.

New parameters were added for Airflow. For more information, see CDE CLI Airflow flag reference and Submitting an Airflow job using the CLI.

CDE supports Spark Structured Streaming for both Spark 2 and Spark 3. For more information, see Support for Spark Structured Streaming in Cloudera Data Engineering (Technical Preview).

You can now restrict or grant access to a virtual cluster for specific groups that you specify. For more information, see Applying user and group access for virtual clusters.

New sliders to edit all-purpose nodes for AWS and Azure have been added to allow users to control the size of your auto-scaling group. For more information, see Enabling a Cloudera Data Engineering service.

CDE now supports K8s 1.27. For more information, see Compatibility for Cloudera Data Engineering and Runtime components.

For more information, see Support lifecycle policy.

Support for Airflow 2.6 to version 2.6. For more information, see Compatibility for Cloudera Data Engineering and Runtime components.

Impala VWs are supported and the CDWOperator is no longer needed for executing queries. For more information, see Automating data pipelines using Apache Airflow in Cloudera Data Engineering.

Version 2.0.43-b229 released on February 20, 2024 includes bug fixes only.

Version 2.0.43-b220 released on February 8, 2024 includes the following features and improvements:

  • AMPs - The AMPs page has been upgraded to render images, make the UI more reactive and improve the overall experience.
  • Azure - Added Azure Qatar Central region as a supported region.

This release introduces the following new features:

You can now use tags to filter your usage insight based on user-level tags of clusters in your CDP environment. For more information, see CDP credit consumption and usage insights.

Cloudera Operational Database (COD) 1.39 version removes a CDP CLI command and provides support for GP3 for attached storage.

COD has removed the support for disengage-auto-admin command, which allowed users to disable the autonomous functions of the database and use the underlying DataHub cluster instead.

COD now supports GP3 (SSD) volume types for attached storage. GP3 volumes allow you to increase performance (independently provisioning IOPS and throughput) without increasing storage size. GP3 volumes deliver similar performance as similar GP2 volumes at a lower cost. GP3 is now the default attached storage type for COD instances that previously used GP2 storage.