CDP Public Cloud: October 2023 Release Summary

Data Warehouse

This release of the Cloudera Data Warehouse (CDW) service on CDP Public Cloud introduces these changes:

Automatically backing up and restoring CDW

In earlier releases, the manual backup and restore feature replaced the in-place upgrade of AWS environments. This release adds automation to back up and restore procedures for AWS and Azure environments. You can still perform Hive and Impala in-place upgrades.

To get the supported Kubernetes version, you back up your old AWS or Azure environment and start up a new environment using the restoration process. The backup/restore feature saves your environment parameters, making it possible to recreate your environment with the same settings, URL, and connection strings you used in your previous environment.

CDP CLI Start/Stop Azure Kubernetes Service (AKS) (Preview)

Using the Beta version of the CDP CLI, you can now start and stop a Cloudera Data Warehouse (CDW) cluster on Azure using the following commands:

  • resumeCluster <cluster ID> starts the cluster.
  • suspendCluster <cluster ID> stops the cluster.

The following prerequisites are required to use resumeCluster and suspendCluster:

  • You obtained a cluster ID parameter to pass with the command.
  • The cluster you want to stop is running.
  • You stopped the Virtual Warehouses and Database Catalogs in the cluster you want to stop
  • You stopped the cluster before attempting to start it.
  • The cluster you want to start or stop is not in an error state.

The Microsoft documentation describes how resumeCluster and suspendCluster starts and stops the AKS cluster, including every node pool and AKS control plane, on the cloud provider side. The stopped AKS incurs zero cost. The Postgres database, which belongs to the environment, is not stopped.

The following limitations apply:

  • The capability to start and stop an AKS cluster is not available from the UI.
  • You cannot start or stop an AKS cluster from the UI.
  • You cannot manually trigger starting or stopping an AKS cluster from the cloud provider either by using the Azure CLI or the Azure portal. Manually triggering start and stop on the cloud provider does not synchronize the state of the AKS instance with Cloudera control plane. The AKS continues running and the CDW UI indicates a running state.

Impala Virtual Warehouse spills temporary data to S3 automatically

When you create an Impala Virtual Warehouse, Impala is now automatically configured to write temporary data to the S3 Data Lake bucket. No configuration is required to enable the spill to S3. However, you must configure an Impala policy for permission to the scratch location on the Data Lake bucket.

Kubernetes dashboard adds Azure support and has reached GA status

From AWS or Azure environments, you can use the K8S dashboard to view the state of resources, such as CPU and memory usage, see the status of pods, and download logs. The dashboard can provide insights into the performance and health of a CDW cluster.

DataFlow

This release (2.6.0-b311) of Cloudera DataFlow (CDF) on CDP Public Cloud introduces NiFi 1.23, vertical scaling, several cost optimization features and the ability to collect Diagnostic Bundles through the UI.

Note: This release of CDF supports deployments running NiFi 1.18.0.2.3.7.0-100 or newer. If your DataFlow service has older NiFi versions, you can perform a Change NiFi Version to bring each into compliance or select to update to the latest as part of DataFlow Upgrade.

  • Added support for downloading NiFi application logs directly from CDF for deployments and test sessions.
  • Added support for suspend and resume CDF Deployments. Suspending a Deployment temporarily terminates the NiFi cluster resources and billing activities while maintaining flow persistence.
  • Added support for resizing an existing CDF Deployment, which allows increasing and decreasing the memory and CPU allotted to an existing NiFi cluster.
  • Added support for selecting the Node Storage Profile when creating a CDF Deployment. This allows the use of economical or high performance persistent storage for the NiFi cluster.
  • New flow deployments on AWS will now use GP3 volumes instead of GP2 providing a better cost/performance ratio.
  • Added active monitoring of NiFi cluster health for CDF Deployments which raises and resolves an alert automatically when a NiFi cluster encounters and recovers from issues impacting its running nodes.
  • Added ability to create sensitive dynamic properties in Flow Designer.
  • Diagnostic Bundles can now be requested via the Unified Diagnostics UI in the Cloudera Manager. Bundles will be uploaded directly to the selected support case.
  • Added Workspace Resources page to workload instances. This provides authorized users with a view of deployments, draft flows, inbound connections, and custom NAR configurations in the workspace.
  • Introduced new deployment alerts for Kubernetes component failures like NiFi pods or supporting statefulsets.
  • Support for updated Kubernetes server versions: AKS 1.26 / EKS 1.25.
  • Improved role assignment processing in the DataFlow workload application so role updates are automatically updated within 5 minutes by regular queries to FreeIPA. The user no longer has to log out and log back in to refresh their roles.
  • Overhauled NiFi node offloading behavior to be more robust and fault resistant in a variety of cases, including when offloading is disrupted by Kubernetes events.
  • Completed significant reliability improvements for larger scale CDF clusters up to 50 nodes, including a variety of efficiency refinements and support for vertical scaling for essential cluster resources.
  • Renamed existing Suspend Flow action to Stop Flow. Renamed the resulting deployment state Suspended to Flow Stopped. A CDF Deployment in Flow Stopped state does not remove NiFi cluster resources and the deployment remains billable.
  • Released the latest version of NiFi 1.23 which includes the following improvements and CDF integrations: Kubernetes Leader Election and State Management - eliminates the need for a ZooKeeper pod in each CDF Deployment cluster.
  • Automatic restart the deployment under the hood if NiFi properties are changed.
  • Vertical auto scaling for Prometheus to automatically adjust memory to support a larger number of deployments and metrics.
  • Improved stability of helm upgrade during CDF upgrade process.
  • Retaining more NiFi provenance data given the current storage capacity allocated for that purpose.

Machine Learning

The version 2.0.41-b225 introduces the following new features and improvements:

  • Model Registry - Model Registry is now GA. Model Registry is the core enabler for MLOps, or DevOps for machine learning. For more information, see Using Model Registry.
  • Experiments - Experiments v2 is now GA. The Experiments feature now integrates with MLflow for managing the model lifecycle. For more information, see Experiments.
  • Service Accounts - Service Accounts, which allow automated processes to run with their own user account, is now GA. For more information, see Service Accounts.
  • Usage log tracking - Usage log records all workloads: sessions, jobs, models, applications and distributed compute, enabling administrators to export and analyze workload statistics on-demand.
  • Kubernetes - Kubernetes 1.25 is now supported for EKS.
  • Azure - New Azure instance types are supported: D4asv4, D16asv5 and D8asv5.
  • Azure - On new installations, the nfs-csi-driver is now enabled.
  • Azure - Cross-environment backup and restore of workspaces is now supported.
  • Applications - Users can now see pod logs for applications. In Application Details, go to the Container Logs tab, and the pod logs are shown. Application and pod logs can be downloaded from the respective pages.
  • Runtime Addons - CML now includes HadoopCLI Runtime Addon 7.2.15, and HadoopCLI 7.2.14 Runtime Addon is removed.

Management Console

This release of the Management Console service introduces the following changes:

Qatar Doha (me-central1) and KSA (me-central2) GCP region support

CDP adds support for launching environments, Data Lakes, and Data Hubs in the Qatar Doha (me-central1) and KSA (me-central2) GCP regions. See updated Supported GCP regions.

Elastic load balancer deletion protection on AWS

When Data Lakes and Data Hubs are created, CDP deploys load balancers on AWS for endpoint stability. With the release of this update, all newly created load balancers for Data Lakes and Data Hubs on AWS are configured with deletion protection enabled, provided that your cross-account policy has the required permissions.

Cloudera has updated the AWS cross-account policy definition with the additional elasticloadbalancing:ModifyLoadBalancerAttributes permission required to set and remove the load balancer deletion protection flag. If you would like to use this feature, please update your cross account policy on AWS by adding the elasticloadbalancing:ModifyLoadBalancerAttributes permission. If you are using the old policy definition, this new feature will not be available (that is, the deletion protection will not be set).

Scaling to a Enterprise Data Lake

You can now scale a light or medium duty Data Lake to an Enterprise Data Lake. Scaling to an EDL is available only on Runtime 7.2.17 and above. For more information, see Data Lake scaling.

Public Endpoint Access Gateway for GCP

You can enable Public Endpoint Access Gateway during GCP environment registration. For more information, see Public Endpoint Access Gateway.

Operational Database

Cloudera Operational Database (COD) 1.36 version introduces the following features:

COD has enabled the OPDB_USE_EPHEMERAL_STORAGE entitlement

COD supports large ephemeral block cache while deploying on cloud storage. The entitlement OPDB_USE_EPHEMERAL_STORAGE is enabled by default while using a large ephemeral block cache on any cloud storage.

COD introduces a new storage type UI option while creating an operational database On the COD UI, a new storage type option Cloud With Ephemeral Storage is added. This option is equivalent to using the –storage-type CLOUD_WITH_EPHEMERAL option on CDP CLI while creating an operational database.

For more information, see Creating a database using COD and CDP CLI Beta.

COD has enabled the COD_EDGE_NODE entitlement

Earlier, you were required to have the COD_EDGE_NODE entitlement to create an edge node on your COD cluster. Now the entitlement is enabled by default.

COD has enabled the COD_STOREFILE_TRACKING entitlement Earlier, you were required to have the COD_STOREFILE_TRACKING entitlement to use the Store File Tracking (SFT) on your COD cluster. Now the entitlement is enabled by default.