What's New in Streams Messaging

Learn about the new Streams Messaging features in Cloudera DataFlow for Data Hub 7.3.2.

Cloudera DataFlow for Data Hub 7.3.2 introduces new Streams Messaging features and includes all service packs and cumulative hotfixes from Cloudera Runtime 7.3.1.100 through 7.3.1.706. For a comprehensive record of all Streams Messaging updates in Cloudera Runtime 7.3.1.x, see New Features.

What's New in Apache Kafka

New features and functional updates for Kafka are introduced in Cloudera DataFlow for Data Hub 7.3.2, its service packs, and cumulative hotfixes.

7.3.2

Rebase on Kafka 3.9

Kafka shipped with this version of Cloudera Runtime is based on Apache Kafka 3.9.1 (previously 3.4.1). For more information, see the following resources:

KRaft is generally available and ZooKeeper is deprecated
KRaft (Kafka Raft) is generally available. KRaft is from now on the recommended metadata management mode for Kafka in Cloudera. Additionally, migrating existing ZooKeeper-based Kafka clusters to use KRaft is now possible.

With the general availability of KRaft, deploying new or using existing Kafka clusters running in ZooKeeper mode is deprecated. Additionally, support for ZooKeeper-based Kafka clusters will be removed in a future release.

Cloudera recommends the following:

  • Deploy all new Kafka clusters in KRaft mode.

  • Migrate existing ZooKeeper-based clusters to KRaft following an upgrade to Cloudera Runtime 7.3.2.

    This is the only version where migration is possible. Neither previous or future major, minor, and maintenance versions support migration.

Kafka protocol and metadata version is set automatically during upgrades
When upgrading Kafka, Cloudera Manager now automatically sets the inter.broker.protocol.version property for ZooKeeper-based clusters and the metadata.version property for KRaft-based clusters. You no longer need to manually set these properties to the current protocol or metadata version before an upgrade. This feature is only available when upgrading to Cloudera Runtime 7.3.2 or higher.

After the upgrade, clearing these properties remains a manual task. However, in Cloudera Runtime 7.3.2 and higher, both inter.broker.protocol.version and metadata.version are now available for direct configuration in Cloudera Manager > Kafka > Configuration. The label names of the properties are Kafka Inter-Broker Protocol Version and Kafka Metadata Version. This means you can set or clear these properties directly from the UI, without needing to use advanced configuration snippets.

Connector-level offset flush control
A new connector-level property, cloudera.offset.flush.interval.ms, is added. Use this property to override the Kafka Connect role-level Offset Flush Interval (offset.flush.interval.ms) property. Overriding enables you to control the interval at which connector task offsets are committed on a per-connector basis.

Configure cloudera.offset.flush.interval.ms in connectors that need a different offset flush interval than the role default. This is commonly useful for connectors where the interval controls how often data is flushed to target systems, for example NiFiStatelessSink, HDFSSink, and S3Sink.

IPv6 support for Kafka

Starting with the 7.3.2 release, Kafka supports IPv6 with dual-stack functionality, allowing seamless communication over both IPv4 and IPv6 networks. This capability improves network scalability, future-proofs deployments, and enhances overall platform security.

Offline Log Directories chart
A new default chart, Offline Log Directories, is added for Kafka in Cloudera Manager. This chart can help you quickly identify and track storage issues on your brokers. It is available by default for the Kafka service as well as for individual Kafka Broker role instances.

The chart shows offline log directories and their mount paths for Kafka brokers. A non-zero value indicates an active error state for a specific log directory, while a value of 0 means the directory was in an error state during the selected timeframe but is now healthy. The chart only displays log directories that had errors during the selected timeframe.

New actions for collecting Kafka diagnostic data

The following new service-specific actions are available for collecting Kafka diagnostic data in Cloudera Manager:

  • Collect Kafka Cluster Diagnostics - gathers detailed cluster-wide data, including topics, configurations, consumer groups, and more.

  • Describe Kafka Topics - provides detailed information about all Kafka topics.

These actions are available in the Actions dropdown on the Kafka service and Kafka Broker role instance pages. Diagnostic data is printed to sdtout for immediate access and also saved as a compressed archive on the host where the action runs.

For more information, seeCollecting Kafka diagnostic data using Cloudera Manager actions Connect.

Debezium connectors upgraded from 1.9.8.Final to 3.3.1.Final

This release of Cloudera Runtime ships version 3.3.1.Final of the following Debezium connectors:

  • MySQL
  • PostgreSQL
  • Oracle
  • SQL Server
  • Db2

Existing connector instances are automatically upgraded to the new version as part of a cluster upgrade. However, you will be required to make configuration updates before you can upgrade your cluster. Critical changes that affect all Debezium connectors are summarized below.

  • Property renaming (configuration namespace changes)

    New, more consistent namespaces for configuration properties are introduced. The old database.* prefixes have been removed. Connector configuration keys collected in the following table must be updated before an upgrade.

    Old Property Prefix (Debezium 1.9) New Property Prefix (Debezium 3.3)
    database.server.name topic.prefix
    database.history.* schema.history.internal.*
    database.* (JDBC pass-through) driver.*
    database.dbname (SQL Server) database.names
  • Database driver version requirements are updated

    The recommended and supported JDBC driver versions used by the majority of connectors has changed. The following table collects the JDBC drivers you will need to deploy on your cluster before an upgrade.

    Component New Driver Version / Notes
    MySQL 9.1.0
    PostgreSQL 42.7.7
    Oracle 21.x, 23.x — use a Java 11+ Oracle JDBC driver (ojdbc11.jar)
    SQL Server 12.4.2.jre8
    Db2 11.5.0.0

For more information, see Getting started with upgrades for Cloudera on cloud.

What's New in Schema Registry

New features and functional updates for Schema Registry are introduced in Cloudera DataFlow for Data Hub 7.3.2, its service packs, and cumulative hotfixes.

7.3.2

There are no new features in this release.

What's New in Streams Messaging Manager

New features and functional updates for Streams Messaging Manager are introduced in Cloudera DataFlow for Data Hub 7.3.2, its service packs, and cumulative hotfixes.

7.3.2

There are no new features in this release.

What's New in Streams Replication Manager

New features and functional updates for Streams Replication Manager are introduced in Cloudera DataFlow for Data Hub 7.3.2, its service packs, and cumulative hotfixes.

7.3.2

Reverse Checkpointing

Streams Replication Manager now supports reverse checkpointing. This feature enables the tracking and replication of consumer offsets from a target cluster back to a source cluster. By tracking offsets in the reverse direction, you ensure that the progress made by consumer groups on a backup cluster is preserved and translated back to the primary cluster during a failback scenario.

Reverse checkpointing minimizes message duplication upon failback by mapping the offsets from the replica topic back to the equivalent offsets in the source topic. To enable this feature, you must configure the following in Cloudera Manager:

  • Set the cloudera.reverse.checkpointing.enabled property to true.
  • Enable bidirectional replication in the Streams Replication Manager's Replication Configs property.

In addition to service configurations, you must use the srm-control tool to explicitly allowlist topics for reverse checkpointing using the reverse-checkpointed-topics command. Consumer group replication must also be enabled in both directions.

Single REST server for all replication flows

Streams Replication Manager now uses a single REST server with a single port to handle inter-worker communication for all replication flows. Previously, a dedicated REST server was started for each replication flow. The new implementation exposes only the endpoints required for inter-worker coordination and task configuration updates. These endpoints are restricted to inter-worker communication and cannot be accessed externally. The legacy per-flow REST server implementation is deprecated in 7.3.2 and will be removed in a future release. Cloudera recommends that you migrate your Streams Replication Manager clusters to the new implementation.

Suppressing internal metrics topics

You can now configure the Streams Replication Manager Service to suppress the eager creation of srm-metrics topics for all possible replication flows. This prevents the creation of unused topics. To enable this behavior, set the metrics.topic.creation.for.possible.flows.enabled property to false.

Configurable timeout for Streams Application Kafka Connection Health Test

A new SRM Service Streams Application Connection Test Timeout (streams.replication.manager.service.streams.application.connection.test.timeout) Cloudera Manager configuration option is now available for the Streams Replication Manager Service. It sets the timeout, in seconds, for the Streams Application Kafka Connection Health Test, which periodically checks connectivity to the target Kafka cluster. The default is 1 second.

What's New in Cruise Control

New features and functional updates for Cruise Control are introduced in Cloudera DataFlow for Data Hub 7.3.2, its service packs, and cumulative hotfixes.

7.3.2

New configuration parameter for controlling IP stack preference

A new cc.additional.java.options configuration parameter is available on the Cruise Control configuration page in Cloudera Manager. The default value sets the IP protocol to IPv4.

New intra.broker.goals configuration for Cruise Control

Cloudera Manager introduces a new intra.broker.goals configuration for Cruise Control. The default value includes com.linkedin.kafka.cruisecontrol.analyzer.goals.IntraBrokerDiskCapacityGoal and com.linkedin.kafka.cruisecontrol.analyzer.goals.IntraBrokerDiskUsageDistributionGoal.

This has an effect on the existing Default Goals (default.goals) configuration, which must be a subset of Supported Goals and Supported Intra Broker Goals.

Additionally, the intra.broker.goals configuration no longer needs to be defined in an advanced configuration snippet if done previously.