What's New in Apache Kafka

Learn about the new features of Apache Kafka in Cloudera Runtime 7.2.16.

Rebase on Kafka 3.1.2

Kafka shipped with this version of Cloudera Runtime is based on Apache Kafka 3.1.2. For more information, see the following upstream resources:

Apache Kafka Notable Changes:
Apache Kafka Release Notes:

Multi-level rack awareness

The rack awareness capabilities of Kafka have been improved to support multi-level cluster topologies. As a result, brokers can now be configured to run in a multi-level rack-aware mode. If this mode is enabled, the brokers provide multi-level rack awareness guarantees. These guarantees ensure that topic partition replicas are spread evenly across all levels of the physical infrastructure. For example, in a two-level hierarchy with Data Centers on the top level and racks on the second level, brokers will evenly spread replicas among both available DCs and racks.

The new mode is compatible with follower fetching. If multi-level mode is enabled, a compatible replica selector class is automatically installed. This implementation enables consumers (if configured), to fetch Kafka messages from the replica that is closest to them in the multi-level hierarchy.

Additionally, when Cruise Control is deployed on the cluster, the standard rack-aware goals in Cruise Control’s configuration are replaced with a multi-level rack-aware goal. This goal ensures that Cruise Control optimizations do not violate the multi-level rack awareness guarantees. This goal is currently downstream only, available exclusively in Cloudera distributed Cruise Control. For more information, see the following resources:

Expose log directory total and usable space through the Kafka API

KAFKA-13958 is backported in Kafka shipped with this version of Runtime. As a result, the Kafka API now exposes metrics regarding the total and usable disk space of log directories. The information on log directory space is collected by SMM and is exposed on the SMM UI. Specifically, you can now view the current log size of topics as well as the total log size and remaining storage space of brokers. For more information on how you can monitor log size metrics on the SMM UI, see Monitoring log size information.

Kafka now accepts OAuthtokens that do not contain a “sub” claim

KAFKA-13730 is backported in Kafka shipped with this version of Runtime. As a result, Kafka now accepts OAuth tokens that do not contain the "sub" claim. If you are using OAuth tokens that do not contain a “sub” claim, the JWT Principal Claim Name For OAuth2 Kafka service property must be configured. This property specifies the claim that contains the client’s principal. For more information on OAuth2 authentication in Kafka, see OAuth2 authentication.

New Kafka connect connectors

The following new Kafka connect connectors are introduced:

  • HDFS Stateless Sink
  • Influx DB Sink
  • Debezium Db2 Source [Technical Preview]

For more information, see Connectors.

Syslog TCP Source connector 2.0.0.

The Syslog TCP Source Kafka Connect connector is updated to version 2.0.0. The following notable changes and improvements are made:

  • Three new properties are added, these are as follows:
    • Max Batch Size

      This property controls the maximum number of messages to add to a single batch of messages. This is a required property. Its default value is 1.

    • Authorized Issuer DN Pattern and Authorized Subject DN Pattern
      These properties allow you to enable authorization for incoming TLS connections. Both properties accept regular expressions as a value. The configured regular expressions are applied against the Distinguished Names of incoming TLS connections. If the Distinguished Names do not match the pattern, the following message is logged and the messages do not get forwarded to Kafka.
      Error: authorization failure
      Both properties are optional and are set to .* by default.
  • The Max Number of TCP Connections property is replaced by the Max Number of Worker Threads property.

    Similarly to Max Number of TCP Connections, Max Number of Worker Threads is also used to specify the number of TCP connections, but instead of exactly specifying the number of allowed connections, you now specify how many worker threads are reserved for TCP connections. Note that a single worker thread is capable of handling multiple connections. This is a required property. Its default value is 2.

  • Existing version 1.0.0. connectors will continue to function, upgrading them, however, is not possible. If you want to use the new version of the connector, you must deploy a new instance of the connector.
  • Deploying a version 1.0.0. instance of the connector is no longer possible.

AvroConverter support for KConnect logical types

The AvroConverter now converts between Connect and Avro temporal and decimal types.

Connect internal topic Ranger policy

A new Ranger policy, connect internal - topic, is generated by default on fresh installations. This policy allows the Kafka and SMM service principals to access Kafka Connect internal topics (connect-configs, connect-offsets, connect-status) and the secret management's storage topic (connect-secrets).

Connector configurations must by default override the sasl.jaas.config property of the Kafka clients used by the connector

The Require Connectors To Override Kafka Client JAAS Configuration Kafka Connect property is now selected by default. This means that connector configurations must by default contain a sasl.jaas.config entry with an appropriate JAAS configuration that can be used to establish a connection with the Kafka service.

Connect JAAS enforcement now applies to non-override type Kafka client configs

When the Require Connectors To Override Kafka Client JAAS Configuration property is selected, the consumer.sasl. and producer.sasl. configurations are not emitted into the Connect worker configurations anymore. Additionally, the keytab name is randomized and the ${cm-agent:keytab} references in the Connector configurations will stop working.