CDP Public Cloud: January 2023 Release Summary

Data Hub

This release of the Data Hub service introduces new features and behaviors:

Vertical scaling of Data Hub instances

If necessary, you can now select a larger or smaller instance type for a Data Hub cluster after it has been deployed. For more information see Vertically scaling instances types.

Management Console

This release of the Management Console service introduces new features and behaviors:

If necessary, you can now select a larger or smaller instance type for a Data Lake or FreeIPA after the environment has been deployed. For more information see Vertically scaling instances types - Data Lake and Vertically scale FreeIPA instances.

Operational Database

Cloudera Operational Database (COD) 1.27 version supports JWT authentication, provides Data Lake templates while creating a database, and a CLI option to enable HBase region canaries.

COD supports configuring JWT authentication for your HBase clients

COD now allows you to configure JWT (JSON Web Token)-based authentication for your HBase clients, which uses an unique identifier and is a standard way of securely transmitting signed information between two parties.

For more information, see Configuring JWT authentication for HBase client.

COD supports creating an operational database using a predefined Data Lake template

When you create an operational database, you can now define the structure of your database based on a predefined Data Lake template. The template defines the number of gateway, master, and worker nodes to be added while creating a database.

You can select a template and accordingly the nodes are added into the COD cluster after the database is successfully created.

To know more about this, see Creating a database using COD.

COD provides a CLI option to enable HBase region canaries

COD now provides a CLI option, –enable-region-canary to enable the HBase region canaries while creating an operational database.

Use the following command to enable the HBase region canaries.

cdp opdb create-database –environment-name ENVIRONMENT_NAME –database-name DATABASE_NAME –enable-region-canary

  • hbase_region_health_canary_enabled
  • hbase_region_health_canary_slow_run_alert_enabled
  • hbase_canary_alert_unhealthy_region_percent_threshold

For more information, see Enabling HBase region canary.

DataFlow

This release (2.3.0-h1-b1) of Cloudera DataFlow (CDF) on CDP Public Cloud introduces a fix for loading custom NARs.

  • NIFI-10981 - Fixed an issue where NarAutoLoader started before the provider retrieved all NARs.

This release (2.3.0-h1-b8) of Cloudera DataFlow (CDF) on CDP Public Cloud introduces fixes and improvements for Flow Designer, enablements, upgrades and general stability.

  • CDPDFX-6593 - Fixed issue where test session reports the wrong state after being started and stopped.
  • CDPDFX-6582 - Fixed issue where processor property configuration does not support special characters.
  • CDPDFX-6596 - Fixed errors which occurred in certain interdependent controller services configurations.
  • CDPDFX-6594 - Fixed issue where suspend updates flag is not cleared after an enable attempt fails, but retry enable succeeds. This results in newly created deployments not displaying properly in the deployment dashboard because heartbeats are being ignored.
  • CDPDFX-6578 - Fixed upgrade issue resulting in larger deployment sizes having insufficient java heap memory allocated to NiFi pods, resulting in Out of Memory issues.
  • CDPDFX-6561 - Fixed issue preventing retry upgrade when previous upgrade attempt failed after kubernetes upgrade to 1.23 but before liftie service finished upgrading deployments.
  • CDPDFX-6479 - Fixed redis pods being unable to recover after non-graceful shutdown
  • CDPDFX-6512 - Improved minifi logging stability by defining memory needs for toybox container

This release (2.3.0-h2-b1) of Cloudera DataFlow (CDF) on CDP Public Cloud introduces fixes and improvements for etrics collection, enablements, upgrades and general stability.

  • CDPDFX-6458 - Fixed issue where NiFi flow deployents failed at host name verification due to a startup script returning the wrong host name.
  • CDPDFX-6671 - Reduced metric collection sample sizes in Cluster-level Prometheus, resulting in lower memory footprint and faster query times without regression or reduction of performance for existing environment metrics.
  • CDPDFX-6677 - Fixed issue where the import of very large flows failed with timeout.
  • CDPDFX-6683 - Fixed issue where a TimeoutException was thrown without its cause being logged.

Data Engineering

This release (1.18.1) of the Cloudera Data Engineering Service on CDP Public Cloud introduces new features and improvements that are described in this topic.

Upgrade to Airflow 2.3.4

CDE 1.18.1 now runs with Airflow 2.3.4. This upgrade includes several fixes to improve performance and stability.

Support for additional Grafana charts

Additional Grafana charts that specify metrics for Livy memory and API server have been added.

Support for Data Lake

CDE 1.18.1 now supports Data Lake 7.2.16. For more information, see Cloudera Data Engineering and Data Lake compatibility.

Data Catalog

This release of the Data Catalog service introduces the following new features. Data Catalog is a service within Cloudera Data Platform that enables you to understand, manage, secure, and govern data assets across enterprise data lakes. Data Catalog helps you understand data across multiple clusters and across multiple environments.

Data Catalog introduces the following additions:

Cloudera Data Catalog is now additionally supported in the EU and APAC regional Control Plane.

Associating Customer Sensitivity Profiler tags to Hive assets fails on the 7.2.16 Cloudera Runtime cluster. You must update the configurations. For more information, see Associating Custom Sensitivity Profiler tags to Hive assets.

Replication Manager

This release of the Replication Manager service introduces the following new features.

HBase replication policy enhancements

The following enhancements are available for HBase replication policies:

  • HBase data replication using HBase replication policies is supported when SFT setting is automatically set on SFT-enabled clusters.

  • You must choose the Select Destination > I want to force the setup of this HBase replication policy option in the HBase policy creation wizard to acknowledge that the first-time setup between the selected source and destination clusters should be initiated after the existing pairing of the source and/or target cluster gets cleared. For more information, see Creating HBase replication policy.

  • You can also use the following options to manage an HBase replication policy:

    • The Replication Policies > HBase replication policy > Actions > Suspend action suspends all the active HBase replication policies between the source and target clusters selected in the replication policy.
    • The Replication Policies > HBase replication policy > Actions > Activate action resumes data replication for all the HBase replication policies between the source and target clusters selected in the replication policy.
    • The Replication Policies > HBase replication policy > Actions > Retry Create action retries the first-time setup between the source and target clusters selected in the replication policy. This option is available only if the first-time setup configuration fails.
    • The Replication Policies > HBase replication policy > Actions > Retry Failed Snapshots action reruns the failed initial snapshots (and only the failed ones) in the replication policy if the Replication Manager failed to replicate the existing data of some tables.
    • The Replication Policies > HBase replication policy > Actions > View Command Details action opens the latest replication policy job and shows the last 15 steps of the log for the policy job run. The steps and substeps appear in a tree view. The failed steps are expanded by default.

For more information, see Managing HBase replication policy.

  • You can now monitor the first-time setup configuration steps and its progress on the Replication Policies > HBase replication policy > Job History tab.

Hive replication policy enhancements

The following enhancements are available for Hive replication policies:

  • You can use Hive replication policies to replicate Hive external tables and metadata from CDH 5.10 and higher clusters and CDP Private Cloud Base 7.1.1 and higher clusters to Data Hubs using Cloudera Manager 7.9.0 or higher.

For more information, see Creating Hive replication policy

  • The Replication Policies > Hive replication policy > Actions > View Command Details action opens the latest replication policy job and shows the last 15 steps of the log for the policy job run. The steps and substeps appear in a tree view. The failed steps are expanded by default.

For more information, see Managing Hive replication policy.