CDH 6 includes Apache Kafka as part of the core package. The documentation includes improved contents for how to set up, install, and administer your Kafka ecosystem. For more information, see the Cloudera Enterprise 6.0.x Apache Kafka Guide. We look forward to your feedback on both the existing and new documentation.
What's New in CDK Powered By Apache Kafka?
This section lists new features in CDK Powered By Apache Kafka. The following links provide detailed information for each release:
- New Features in CDK 4.0.0 Powered By Apache Kafka
- New Features in CDK 3.1.0 Powered By Apache Kafka
- New Features in CDK 3.0.0 Powered By Apache Kafka
- New Features in CDK 2.2.0 Powered By Apache Kafka
- New Features in CDK 2.1.0 Powered By Apache Kafka
- New Features in Cloudera Distribution CDK 2.0.0 Powered By Apache Kafka
- New Features in CDK 1.4.0 Powered By Apache Kafka
- New features in CDK 1.3.0 Powered By Apache Kafka
- New Features in CDK 1.1.0 Powered By Apache Kafka
New Features in CDK 4.0.0 Powered By Apache Kafka
- Rebase on Kafka 2.1.0
CDK 4.0.0 Powered By Apache Kafka is a major release based on Apache Kafka 2.1.0. For upstream release notes, see Apache Kafka version 1.0.2, 1.1.0, 1.1.1, 2.0.0, 2.0.1 and 2.1.0 release notes.
- JBOD Support
As of CDK 4.0.0, Cloudera officially supports Kafka clusters with nodes using JBOD configurations. JBOD support introduces a new command line tool and improves an existing tool:
- A new tool, kafka-log-dirs, is added. The tool allows users to query partition assignment information.
- The kafka-reassign-partitions tool is expanded with a new functionality that allows users to reassign partitions between log directories. Users can move partitions to a different log directory on the same broker as well as to log directories on other brokers.
- Kafka Streams
Starting with CDK 4.0.0, Cloudera officially supports Kafka Streams. You can access the Apache Kafka website for information about how to use Kafka Streams.
- Read the Kafka Streams Introduction for an overview of the feature and an introductory video.
- Get familiar with Kafka Streams Core Concepts.
- Understand Kafka Streams Architecture.
- Access the Quick Start documentation to run a demonstration Kafka Streams Application.
- Use the Tutorial to write your first Kafka Streams Application.
- Exactly Once Semantics
Starting with CDK 4.0.0, Cloudera officially supports idempotent and transactional capabilities in the producer.This feature ensures that messages are delivered exactly once to a particular topic partition during the lifetime of a single producer.
New Features in CDK 3.1.0 Powered By Apache Kafka
- Rebase on Kafka 1.0.1
CDK 3.1.0 Powered By Apache Kafka is a minor release based on Apache Kafka 1.0.1.
For upstream release notes, see Apache Kafka version 1.0.0 and 1.0.1 release notes.
- Kafka uses HA-capable Sentry client
This functionality enables automatic failover in the event that the primary Sentry host goes down or is unavailable.
- Wildcard usage for Kafka-Sentry components
You can specify an asterisk (*) in a Kafa-Sentry command for the TOPIC component of a privilege to refer to any topic in the privilege. Supported with CDH 5.14.2.
You can also use an asterisk (*) in a Kafka-Sentry command for the CONSUMERGROUPS component of a privilege to refer to any consumer groups in the privilege. This is useful when used with Spark Streaming, where a generated group.id may be needed. Supported with CDH 5.14.2.
- Health Tests in Cloudera Manager
Two new Kafka Broker Health Tests have been added to Cloudera Manager:
- Kafka Broker Swap Memory Usage
- Kafka Broker Unexpected Exits
These health tests are available when Kafka is managed by Cloudera Manager version 5.14 and later. For details, see Kafka Broker Health Tests.
New Features in CDK 3.0.0 Powered By Apache Kafka
- Rebase on Kafka 0.11.0.0
CDK 3.0.0 Powered By Apache Kafka is a major release based on Apache Kafka 0.11.0.0. For upstream release notes, see Apache Kafka version 0.11.0.0 release notes.
- Health test for offline and lagging partitions
New health tests set the controller broker's health to BAD if the broker hosts at least one offline partition and the leader broker's health to CONCERNING if it hosts any lagging partitions. Supported with Cloudera Manager 5.14.0.
New Features in CDK 2.2.0 Powered By Apache Kafka
- Rebase on Kafka 0.10.2
CDK 2.2.0 Powered By Apache Kafka is rebased on Apache Kafka 0.10.2. For upstream release notes, see Apache Kafka version 0.10.2.release notes.
New Features in CDK 2.1.0 Powered By Apache Kafka
- Rebase on Kafka 0.10
Cloudera Distribution of Apache Kafka 2.1.0 is rebased on Apache Kafka 0.10. For upstream release notes, see Apache Kafka version 0.10 release notes.
- Sentry Authentication
Apache Sentry includes Kafka binding you can use to enable authorization in Kafka with Sentry. See Configuring Kafka to Use Sentry Authorization.
New Features in Cloudera Distribution CDK 2.0.0 Powered By Apache Kafka
- Rebase on Kafka 0.9
CDK 2.0.0 Powered By Apache Kafka is rebased on Apache Kafka 0.9. For upstream release notes, see Apache Kafka version 0.9release notes.
- Kerberos
CDK 2.0.0 Powered By Apache Kafka supports Kerberos authentication of connections from clients and other brokers, including to ZooKeeper.
- SSL
CDK 2.0.0 Powered By Apache Kafka supports wire encryption of communications from clients and other brokers using SSL.
- New Consumer API
CDK 2.0.0 Powered By Apache Kafka includes a new Java API for consumers.
- MirrorMaker
MirrorMaker is enhanced to help prevent data loss and improve reliability of cross-data center replication.
- Quotas
You can use per-user quotas to throttle producer and consumer throughput in a multitenant cluster. See Quotas.
New Features in CDK 1.4.0 Powered By Apache Kafka
- CDK 1.4.0 Powered By Apache Kafka is distributed as a package as well as a parcel. See CDK Powered By Apache Kafka® Version and Packaging Information.
- RHEL 7.1
Kafka 1.3.2 supports RHEL 7.1. See Supported Operating Systems
New features in CDK 1.3.0 Powered By Apache Kafka
- Metrics Reporter
Cloudera Manager now displays Kafka metrics. Use the values to identify current performance issues and plan enhancements to handle anticipated changes in workload. See Viewing Apache Kafka Metrics.
- MirrorMaker configuration
Cloudera Manager allows you to configure the Kafka MirrorMaker cross-cluster replication service. You can add a MirrorMaker role and use it to replicate to a machine in another cluster. See Kafka MirrorMaker.
New Features in CDK 1.1.0 Powered By Apache Kafka
- New producer
The producer added in CDK 1.1.0 Powered By Apache Kafka combines features of the existing synchronous and asynchronous producers. Send requests are batched, allowing the new producer to perform as well as the asynchronous producer under load. Every send request returns a response object that can be used to retrieve status and exceptions.
- Ability to delete topics
You can now delete topics using the kafka-topics --delete command.
- Offset management
In previous versions, consumers that wanted to keep track of which messages were consumed did so by updating the offset of the last consumed message in ZooKeeper. With this new feature, Kafka itself tracks the offsets. Using offset management can significantly improve consumer performance.
- Automatic leader rebalancing
Each partition starts with a randomly selected leader replica that handles requests for that partition. When a cluster first starts, the leaders are evenly balanced among hosts. When a broker restarts, leaders from that broker are distributed to other brokers, which results in an unbalanced distribution. With this feature enabled, leaders are assigned to the original replica after a restart.
- Connection quotas
Kafka administrators can limit the number of connections allowed from a single IP address. By default, this limit is 10 connections per IP address. This prevents misconfigured or malicious clients from destabilizing a Kafka broker by opening a large number of connections and using all available file handles.