What's new in Streams Messaging

Learn about the new Streams Messaging features in Cloudera DataFlow for Data Hub 7.2.18.

Kafka

Rebase on Kafka 3.4.1
Kafka shipped with this version of Cloudera Runtime is based on Apache Kafka 3.4.1. For more information, see the following upstream resources:
Apache Kafka Notable Changes:
Apache Kafka Release Notes:
Kafka log directory monitoring improvements
A new Cloudera Manager chart, trigger, and action is added for the Kafka service. These assist you in monitoring the log directory space of the Kafka Brokers, and enable you to prevent Kafka disks from filling up.

The chart is called Log Directory Free Capacity. It shows the capacity of each Kafka Broker log directory.

The trigger is called Broker Log Directory Free Capacity Check. It is triggered if the capacity of any log directory falls below 10%. The trigger is automatically created for all newly deployed Kafka services, but must be created with the Create Kafka Log Directory Free Capacity Check action for existing services following an upgrade.

The chart and trigger are available on the Kafka service > Status page. The action is available in Kafka service > Actions.

Kafka is safely stopped during operating system upgrades
During OS upgrades, Cloudera Manager now ensures that Kafka brokers are safely stopped. Specifically, Cloudera Manager now performs a rolling restart check before stopping a broker. This ensures that the Kafka service stays healthy during the upgrade. The level of health guarantee that Cloudera Manager ensures is determined by the restart check type set in the Cluster Health Guarantee During Rolling Restart Kafka property. Cloudera recommends that you set this property to all partitions stay healthy to avoid service outages. For more information, see Rolling restart checks.
useSubjectCredsOnly set to true by default in Kafka Connect
In previous versions, the javax.security.auth.useSubjectCredsOnly JVM property was set to false in Kafka Connect. Because of this, connectors running with an invalid or no JAAS configuration could use the credentials of other connectors to establish connections. Starting with this release, useSubjectCredsOnly is set to true by default. As a result, connectors are required to use their own credentials.

This default change is true for newly provisioned clusters. On upgraded clusters, useSubjectCredsOnly remains set to false to ensure backwards compatibility. If you are migrating connectors from a cluster running a previous version of Runtime to a new cluster running 7.2.18 or later, you must ensure that credentials are added to the connector configuration when migrated. Otherwise, migrated connectors may not work on the new cluster.

In addition to the default value change, a new Kafka Connect property is introduced in Cloudera Manager that you can use to set useSubjectCredsOnly. The property is called Add Use Subject Credentials Only JVM Option With True Value. Setting this property to false does not expressly set useSubjectCredsOnlyto false. Instead, it sets useSubjectCredsOnly to the cluster default value.

Kafka Connect metrics reporter security configurable in Cloudera Manager
New, dedicated Cloudera Manager properties are introduced for the security configuration of the Kafka Connect metrics reporter. As a result, you are no longer required to use advanced security snippets if you want to secure the metrics reporter and its endpoint. The new properties introduced are as follows:
  • Secure Jetty Metrics Port
  • Enable Basic Authentication for Metrics Reporter
  • Jetty Metrics User Name
  • Jetty Metrics Password
A dedicated property to enable TLS/SSL for the metrics reporter is not available. Instead, you must select Enable TLS/SSL for Kafka Connect which enables TLS/SSL for the Kafka Connect role including the metrics reporter. For more information regarding these properties, see Cloudera Manager Configuration Properties Reference.

As a result of these changes, the setup steps required to configure Prometheus as the metrics store for SMM are changed. For updated deployment instructions, see Setting up Prometheus for Streams Messaging Manager.

Kafka load balancer is automatically configured with the LDAP handler if LDAP authentication is configured
When a load balancer and LDAP authentication is configured for Kafka, the PLAIN mechanism is automatically added to the enabled authentication mechanisms of the load balancer listener. Additionally, the load balancer is automatically configured to use LdapPlainServerCallbackHandler as the callback handler.
Kafka Connect now supports Kerberos auth-to-local (ATL) rules with SPNEGO authentication
Kafka Connect now uses the cluster-wide Kerberos auth-to-local (ATL) rules by default. A new configuration property called Kafka Connect SPNEGO Auth To Local Rules is introduced. This property is used to manually specify the ATL rules. During an upgrade, the property is set to DEFAULT to ensure backward compatibility. Following an upgrade, if you want to use the cluster-wide rules, clear the existing value from the Kafka Connect SPNEGO Auth To Local Rules property.

Schema Registry

AvroConverter support for KConnect logical types
AvroConverter now converts between Connect and Avro temporal and decimal types.
Support for alternative jersey connectors in SchemaRegistryClient
connector.provider.class can be configured in Schema Registry Client. If it is configured, schema.registry.client.retry.policy should also be configured to be different than default.

This also fixes the issue with some third party load balancers where the client is expected to follow redirects and authenticate while doing that.

Remove modules section from registry.yaml
In previous versions, the registry.yaml configuration file contained a modules section. This section was used to list pluggable modules that extended Schema Registry’s functionality. However, modules were never fully supported and have been removed in a previous release. The modules section in registry.yaml was kept for backwards compatibility. Starting with this version, the modules section is removed by default from registry.yaml.
Upgraded Avro version to 1.11.1
Avro got upgraded from version 1.9.1 to 1.11.1.
New fingerprint version is added to Schema Registry with configuring option
A new fingerprint version, V2 is available in Schema Registry that contains the missing schema parts from the previous version. Newly created 7.2.18 clusters use the V2 fingerprint version. Upgraded clusters still use the V1 fingerprint version, but the schema.registry.fingerprint.version property can be used to change the fingerprint version in Schema Registry. Cloudera recommends to change the fingerprint version to V2 after upgrading to 7.2.18.
Support for additional JVM options
Additional JVM options can be passed to Schema Registry using the schema.registry.additional.java.options property in Cloudera Manager.

Streams Messaging Manager

UI updates
The style of SMM UI is updated. This update includes various changes to the colors, fonts, and overall style of the UI. Additionally, the following functional changes and improvements are made:
Cruise Control UI
A new page is added to Streams Messaging Manager to monitor the Kafka cluster state and rebalancing process with Cruise Control. The Cruise Control User Interface (UI) enables you to review and configure the rebalancing of Kafka clusters through dashboards and a rebalancing wizard. The available goals and anomaly detectors are based on the Cloudera Manager configurations of Cruise Control. You can access Cruise Control from SMM using the on the navigation sidebar.

For more information about Cruise Control in SMM, see Monitoring and managing Kafka cluster rebalancing.

Data Explorer
  • When you view Avro data in the Data Explorer, logicalTypes are converted by default. That is, instead of showing the underlying type, (for example, byte) the Data Explorer displays proper deserialized values.
  • Avro messages are now pretty printed when you open them using the Show More option.
Kafka Connect
  • Hovering over the status icons of connector tasks now displays the status text instead of the name of the icon.
  • The Add missing configurations option now populates missing properties with default values.
  • Adding flow.snapshot into a key field of a password type property clears password placeholders.
  • The value field of the flow.snapshot property is now always a text area. Previously, if the property was added manually, the value field was a text field instead of an area.
  • Text found in the (Help) tooltip of connector property values now displays properly. Long strings no longer overflow the tooltip. Additionally, property descriptions are truncated. Clicking or more... displays a pop-up containing the full description.
  • Validating a connector configuration when the Kafka service is stopped returns a Connection refused error instead of validation passing.
  • Deploying a new connector that has the same name as an existing connector no longer updates the existing connector. Instead, the connector deployment fails with Connector [***NAME***] already exists.
  • A loading animation is displayed when loading connector templates.
  • The horizontal divider found in the context menu of properties is no longer displayed if the String , Boolean, Number, and Password options are not available for the property

For more information regarding the various new features and options related to Kafka Connect, see Managing and monitoring Kafka Connect using Streams Messaging Manager.

Other
All charts present on the UI received a visual update. Additionally, more details are presented about the data they display.
Changes in Prometheus setup and configuration
Kafka Connect is now capable of securing its metric reporter with TLS/SSL and Basic Authentication. As a result, the setup steps required to configure Prometheus as the metrics store for SMM are changed. For updated deployment instructions, see Setting up Prometheus for Streams Messaging Manager.
SMM internal Kafka topics are created with a replication factor of 3
From now on the __smm* internal SMM topics are created with a replication factor of 3. This change is only true for newly deployed clusters. The replication factor is not updated during the upgrade. Cloudera recommends that you increase the replication factor of these topics to 3 with kafka-reassign-partitions following an upgrade.
Remove keystore from SMM Schema Registry client configuration if Kerberos is enabled for Schema Registry
SMM uses a Schema Registry client to fetch schemas from Schema Registry. This Schema Registry client has Kerberos authentication properties and keystore properties for mTLS. Typically, the Schema Registry server, by default, does not allow mTLS authentication. But if mTLS is enabled in the Schema Registry server, then mTLS authentication has a higher precedence than Kerberos. Therefore, the mTLS principal (from the keystore) is used for authorization with Ranger rather than the Kerberos principal. This might result in authorization failures if the mTLS principal is not added to Ranger to access the Schema Registry resources.

From now on, the Schema Registry client used by SMM does not have keystore properties for mTLS when Kerberos is enabled. As a result, even if mTLS is enabled for the Schema Registry server, the Kerberos principal is used for authentication and authorization with Ranger.

Dedicated endpoint for connector creation
A dedicated endpoint for connector creation is introduced. The endpoint is POST /api/v1/admin/kafka-connect/connectors. Requests made to this endpoint fail with the following error message if the connector name specified in the request exists.
{
  "error_code": 409,
  "message": "Connector [***NAME***] already exists"
}

In previous versions, PUT api/v1/admin/kafka-connect/connectors/{connector} was the only endpoint you could use to create connectors. However, this endpoint is also used to update connectors. If you specify the name of an existing connector in the request, the endpoint updates existing connector. As a result, Cloudera recommends that you use the new endpoint for connector creation going forward. The SMM UI is also updated and uses the new endpoint for connector creation.

Jersey client timeout now configurable
SMM uses internal Jersey clients to make requests to Kafka Connect and Cruise Control. The connection and read timeouts for these clients was previously hard-coded to 30 seconds. Configuring them was not possible. This release introduces new properties, which enable you to configure the connection and read timeouts of these clients. The default timeout remains 30 seconds. The properties introduced are as follows.
  • Kafka Connect Client Connect timeout
  • Kafka Connect Client Read timeout
  • Cruise Control Client Connect timeout
  • Cruise Control Client Read timeout

Streams Replication Manager

The --to option in srm-control now creates the file if it does not exist
From now on, srm-control creates the file specified with the --to option if the file does not exist.
Verification of internal metrics topics can now be disabled
A new property, Verify Partition Count Of The Metrics Topic is introduced for the SRM service. This property controls whether SRM verifies the partition count of the srm-metrics-[***SOURCE CLUSTER ALIAS***].internal topics (raw metric topics) when SRM is started. This property is selected by default. Cloudera recommends that you keep this property selected. During certain upgrades, the property is set to false automatically for the duration of the upgrade to avoid upgrade issues.
Prefixless replication with the IdentityReplicationPolicy
Full support, including replication monitoring, is introduced for the IdentityReplicationPolicy. Unlike the DefaultReplicationPolicy, this policy does not rename remote (replicated) topics on target clusters. That is, the topics that you replicate will have the same name in both source and target clusters. This replication policy is recommended for deployments where SRM is used to aggregate data from multiple streaming pipelines. Alternatively, this replication policy can also be used if the deployment requires MirrorMaker1 (MM1) compatible replication.

Prefixless replication is enabled in Cloudera Manager with the Enable Prefixless Replication property. This property configures SRM to use the IdentityReplicationPolicy and enables internal topic based remote topic discovery, which is required for replication monitoring.

Limitations:
  • Replication loop detection is not supported. As a result, you must ensure that topics are not replicated in a loop between your source and target clusters.
  • The /v2/topic-metrics/{target}/{downstreamTopic}/{metric} endpoint of SRM Service v2 API does not work properly with prefixless replication. Use the /v2/topic-metrics/{source}/{target}/{upstreamTopic}/{metric} endpoint instead.
  • The replication metric graphs shown on the Topic Details page of the SMM UI do not work with prefixless replication.
Internal topic based remote topic discovery
From now on, SRM uses an internal Kafka topic to keep track of remote (replicated) topics. Previously, SRM relied on the naming conventions (prefixes) used by the DefaultReplicationPolicy to discover and track remote topics.

This feature enables SRM to provide better monitoring insights on replications. Additionally, if the feature is enabled, SRM is capable of providing replication monitoring even if a replication policy other than the DefaultReplicationPolicy is in use. Most notably, this enables replication monitoring when SRM is configured for prefixless replication with the IdentityReplicationPolicy.

This feature is enabled in Cloudera Manager by selecting the Remote Topics Discovery With Internal Topic property. The property is selected by default on newly deployed clusters, but must be enabled manually for existing clusters after an upgrade. Cloudera recommends that you enable this feature no matter what replication policy you are using.

For more information, see Streams Replication Manager remote topic discovery .

Configurations to customize replication-records-lag metric calculation
Three new properties are introduced that enable you to control how SRM calculates the replication-records-lag metric. This metric provides information regarding the replication lag based on offsets. The metric is available both on the cluster and the topic level. The following new properties are introduced because the calculation of the metric with default configurations might add latency to replications and impact SRM performance. While these properties are configured in Cloudera Manager, they do not have dedicated configuration entries. Instead, you add them to Streams Replication Manager's Replication Configs to configure them.
Table 1.
Property Default Value Description
replication.records.lag.calc.enabled true Controls whether the replication-records-lag metric is calculated. This metric provides information regarding the replication lag based on offsets. The metric is available both on the cluster and the topic level. The calculation of this metric might add latency to replications and impact SRM performance. If you are experiencing performance issues, you can try setting this property to false to disable the calculation of replication-records-lag. Alternatively, you can try fine-tuning how SRM calculates replication-records-lag with the replication.records.lag.calc.period.ms and replication.records.lag.end.offset.timeout.ms properties.
replication.records.lag.calc.period.ms 0 Controls how frequently SRM calculates the replication-records-lag metric. The default value of 0 means that the metric is calculated continuously. Cloudera recommends configuring this property to 15000 ms (15 seconds) or higher if you are experiencing issues related to the calculation of replication-records-lag. A calculation frequency of 15 seconds or more results in the metric being available for consumption without any significant impact on SRM performance.
replication.records.lag.end.offset.timeout.ms 6000 Specifies the Kafka end offset timeout value used for calculating the replication-records-lag metric. Setting this property to a lower value than the default 6000 ms (1 minute) might reduce latency in calculating replication-records-lag, however, replication-records-lag calculation might fail. A value higher than the default can help avoid metric calculation failures, but might increase replication latency and lower SRM performance.

Cruise Control

Cruise Control is added to Streams Messaging Manager UI

A new page is added to Streams Messaging Manager to monitor the Kafka cluster state and rebalancing process with Cruise Control. The Cruise Control User Interface (UI) enables you to review and configure the rebalancing of Kafka clusters through dashboards and a rebalancing wizard. The available goals and anomaly detectors are based on the Cloudera Manager configurations of Cruise Control. You can access Cruise Control from SMM using the on the navigation sidebar.

For more information about Cruise Control in SMM, see Monitoring and managing Kafka cluster rebalancing.