What's New in Apache Kafka

Rebase on Kafka 3.4.1🔗

Kafka shipped with this version of Cloudera Runtime is based on Apache Kafka 3.4.1. For more information, see the following upstream resources:

Apache Kafka Notable Changes:

Apache Kafka Release Notes:

Kafka log directory monitoring improvements🔗

A new Cloudera Manager chart, trigger, and action is added for the Kafka service. These assist you in monitoring the log directory space of the Kafka Brokers, and enable you to prevent Kafka disks from filling up.

The chart is called Log Directory Free Capacity. It shows the capacity of each Kafka Broker log directory.

The trigger is called Broker Log Directory Free Capacity Check. It is triggered if the capacity of any log directory falls below 10%. The trigger is automatically created for all newly deployed Kafka services, but must be created with the Create Kafka Log Directory Free Capacity Check action for existing services following an upgrade.

The chart and trigger are available on the Kafka service > Status page. The action is available in Kafka service > Actions.

Kafka is safely stopped during operating system upgrades🔗

During OS upgrades, Cloudera Manager now ensures that Kafka brokers are safely stopped. Specifically, Cloudera Manager now performs a rolling restart check before stopping a broker. This ensures that the Kafka service stays healthy during the upgrade. The level of health guarantee that Cloudera Manager ensures is determined by the restart check type set in the Cluster Health Guarantee During Rolling Restart Kafka property. Cloudera recommends that you set this property to all partitions stay healthy to avoid service outages. For more information, see Rolling restart checks.

useSubjectCredsOnly set to true by default in Kafka Connect🔗

In previous versions, the javax.security.auth.useSubjectCredsOnly JVM property was set to false in Kafka Connect. Because of this, connectors running with an invalid or no JAAS configuration could use the credentials of other connectors to establish connections. Starting with this release, useSubjectCredsOnly is set to true by default. As a result, connectors are required to use their own credentials.

This default change is true for newly provisioned clusters. On upgraded clusters, useSubjectCredsOnly remains set to false to ensure backwards compatibility. If you are migrating connectors from a cluster running a previous version of Runtime to a new cluster running 7.2.18 or later, you must ensure that credentials are added to the connector configuration when migrated. Otherwise, migrated connectors may not work on the new cluster.

In addition to the default value change, a new Kafka Connect property is introduced in Cloudera Manager that you can use to set useSubjectCredsOnly. The property is called Add Use Subject Credentials Only JVM Option With True Value. Setting this property to false does not expressly set useSubjectCredsOnlyto false. Instead, it sets useSubjectCredsOnly to the cluster default value.

Kafka Connect metrics reporter security configurable in Cloudera Manager🔗

New, dedicated Cloudera Manager properties are introduced for the security configuration of the Kafka Connect metrics reporter. As a result, you are no longer required to use advanced security snippets if you want to secure the metrics reporter and its endpoint. The new properties introduced are as follows:

Secure Jetty Metrics Port
Enable Basic Authentication for Metrics Reporter
Jetty Metrics User Name
Jetty Metrics Password

A dedicated property to enable TLS/SSL for the metrics reporter is not available. Instead, you must select Enable TLS/SSL for Kafka Connect which enables TLS/SSL for the Kafka Connect role including the metrics reporter. For more information regarding these properties, see Cloudera Manager Configuration Properties Reference.

As a result of these changes, the setup steps required to configure Prometheus as the metrics store for SMM are changed. For updated deployment instructions, see Setting up Prometheus for Streams Messaging Manager.

Kafka load balancer is automatically configured with the LDAP handler if LDAP authentication is configured🔗

When a load balancer and LDAP authentication is configured for Kafka, the PLAIN mechanism is automatically added to the enabled authentication mechanisms of the load balancer listener. Additionally, the load balancer is automatically configured to use LdapPlainServerCallbackHandler as the callback handler.

Kafka Connect now supports Kerberos auth-to-local (ATL) rules with SPNEGO authentication🔗

Kafka Connect now uses the cluster-wide Kerberos auth-to-local (ATL) rules by default. A new configuration property called Kafka Connect SPNEGO Auth To Local Rules is introduced. This property is used to manually specify the ATL rules. During an upgrade, the property is set to DEFAULT to ensure backward compatibility. Following an upgrade, if you want to use the cluster-wide rules, clear the existing value from the Kafka Connect SPNEGO Auth To Local Rules property.

Debezium connector version update 🔗

All Debezium connectors shipped with Cloudera Runtime are upgraded to version 1.9.7. Existing instances of the connectors are automatically upgraded to the new version during cluster upgrade. Deploying the previously shipped version of the connector is not possible. For more information see Kafka Connectors in Runtime or the Debezium documentation.

Persistent MQTT sessions support for the MQTT Source connector🔗

Version 1.1.0 of the MQTT Source connector is released. The connector now supports MQTT persistent sessions. This enables the connector to resume (persist) a previous session with an MQTT broker after a session is interrupted. Enabling this feature can ensure that no messages are lost if the connector is momentarily stopped or if the network connection is interrupted.

To support persistent sessions, the following new properties are introduced:

MQTT Client ID
This property specifies the MQTT client ID that the connector uses.
MQTT Clean Session
This property controls whether the connector should start clean or persistent sessions. Set this property to false to enable persistent sessions.

Existing connectors will continue to function, upgrading them, however, is not possible. If you want to use the new version of the connector, you must deploy a new instance of the connector. For more information, see MQTT Source connector and MQTT Source properties reference.

Parquet support for the S3 Sink connector🔗

Version 2.0.0 of the S3 Sink connector is released. The connector now supports Parquet as an output file data format. The following property changes are made to support Parquet:

A new property, Parquet Compression Type, is added.
This property specifies the compression type used for writing Parquet files. Accepted values are UNCOMPRESSED,SNAPPY, GZIP, LZO, BROTLI, LZ4, and ZSTD.
The Output File Data Format property now accepts Parquet as a value.

Existing connectors will continue to function, upgrading them, however, is not possible. If you want to use the new version of the connector, you must deploy a new instance of the connector.

For more information, see S3 Sink connector and S3 Sink properties reference .

Support schema ID encoding in the payload or message header in Stateless NiFi connectors 🔗

The Kafka Connect connectors powered by Stateless NiFi that support record processing are updated to support content-encoded schema references for Avro messages. These connectors now properly support integration with Schema Registry and SMM.

This improvement introduces the following changes in the affected connectors.

A new value, HWX Content-Encoded Schema Reference, is introduced for the Schema Access Strategy property

If this value is set, the schema is read from Schema Registry, and the connector expects that the Avro messages contain a content-encoded schema reference. That is, the message contains a schema reference that is encoded in the message content. The new value is introduced for the following connectors:

ADLS Sink
HDFS Sink
HTTP Sink
Influx DB Sink
JDBC Sink
JDBC Source
Kudu Sink
S3 Sink

The Schema Write Strategy property is removed from the following connectors

ADLS Sink
HDFS Sink
S3 Sink
InfluxDB Sink

A new property, Avro Schema Write Strategy is introduced

This property specifies whether and how the record schema is attached to the output data file when the format of the output is Avro. The property supports the following values:

Do Not Write Schema: neither the schema nor reference to the schema is attached to the output Avro messages.
Embed Avro Schema: the schema is embedded in every output Avro message.
HWX Content-Encoded Schema Reference: a reference to the schema (identified by Schema Name) within Schema Registry is encoded in the content of the outgoing Avro messages.

This property is introduced for the following connectors:

ADLS Sink
HDFS Sink
S3 Sink
SFTP Source
Syslog TCP Source
Syslog UDP Source

The minor or major version of all affected connectors is updated