Installing or Upgrading CDK Powered By Apache Kafka®
Minimum Required Role: Cluster Administrator (also provided by Full Administrator)
Kafka is distributed as a parcel, separate from the CDH parcel. It is also distributed as a package. The steps to install Kafka vary, depending on whether you choose to install from a parcel or a package.
General Information Regarding Installation and Upgrade
Cloudera Manager 5.4 and higher includes the Kafka service. To install, download Kafka using Cloudera Manager, distribute Kafka to the cluster, activate the new parcel, and add the service to the cluster. For a list of available parcels and packages, see CDK Powered By Apache Kafka® Version and Packaging Information
Rolling Upgrade to CDK 3.0.x Powered By Apache Kafka®
Before upgrading from CDK 2.x.x to CDK 3.0.x, ensure that you set inter.broker.protocol.version and log.message.format.version to the current Kafka version, and then unset them after the upgrade. This is a good practice because the newer broker versions might write log entries that the older brokers will not be able to read. And if you need to rollback to the older version, and you have not set inter.broker.protocol.version and log.message.format.version, data loss might occur.
- To upgrade from CDK 2.0.x Powered By Apache Kafka, use 0.9.0
- To upgrade from CDK 2.1.x Powered By Apache Kafka, use 0.10.0
- To upgrade from CDK 2.2.x Powered By Apache Kafka, use 0.10.2
- Upgrade Kafka brokers to CDK 3.0.x Powered By Apache Kafka.
- Update server.properties file on all brokers with the following properties: inter.broker.protocol.version = <current_Kafka_version> and log.message.format.version = <current_Kafka_version>, as follows:
- From the Clusters menu, select the Kafka cluster.
- Click the Configuration tab.
- Use the Search field to find the Kafka Broker Advanced Configuration Snippet (Safety Valve) configuration property.
- Add the following properties to the Kafka Broker Advanced Configuration Snippet (Safety Valve) for kafka.properties:
To upgrade from CDK 2.0.x to CDK 3.0.x, enter:
inter.broker.protocol.version=0.9.0
log.message.format.version=0.9.0
To upgrade from CDK 2.1.x to CDK 3.0.x, enter:
inter.broker.protocol.version=0.10.0
log.message.format.version=0.10.0
To upgrade from CDK 2.2.x to CDK 3.0.x, enter:
inter.broker.protocol.version=0.10.2
log.message.format.version=0.10.2
Make sure you enter 3 digits for the version. Otherwise, the following error will occur:2017-12-14 14:25:47,818 FATAL kafka.Kafka$: java.lang.IllegalArgumentException: Version `0.10` is not a valid version at kafka.api.ApiVersion$$anonfun$apply$1.apply(ApiVersion.scala:72) at kafka.api.ApiVersion$$anonfun$apply$1.apply(ApiVersion.scala:72) at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
- Save your changes.
- Download, distribute, and activate the new parcel. Do not restart the Kafka service, select Activate Only and click OK.
- Perform a rolling restart. Select Rolling Restart or Restart based on the downtime that can be afforded.
- Upgrade all CDK 2.x.x clients to CDK 3.0.x.
- After the whole cluster restart is successful, remove the above settings and restart the cluster again.
Graceful Shutdown of Kafka Brokers
Property | Description | Default Value |
---|---|---|
Enable Controlled Shutdown | Enables controlled shutdown of the broker. If enabled, the broker moves all leaders on it to other brokers before shutting itself down. This reduces the unavailability window during shutdown. | Enabled |
Graceful Shutdown Timeout | The timeout in milliseconds to wait for graceful shutdown to complete. | 30000 milliseconds
(30 seconds) |
To configure these properties, go to
and search for "shutdown".If Kafka is taking a long time for controlled shutdown to complete, consider increasing the value of Graceful Shutdown Timeout. When this timeout is reached, Cloudera Manager issues a forced shutdown, which interrupts the controlled shutdown and could cause subsequent restarts to take longer than expected.
Disks and Filesystem
Cloudera recommends that you use multiple drives to get good throughput. To ensure good latency, do not share the same drives used for Kafka data with application logs or other OS filesystem activity. You can either use RAID to combine these drives into a single volume, or format and mount each drive as its own directory. Since Kafka has replication, RAID can also provide redundancy at the application level. This choice has several tradeoffs.
If you configure multiple data directories, partitions are assigned round-robin to data directories. Each partition is stored entirely in one of the data directories. This can lead to load imbalance between disks if data is not well balanced among partitions.
RAID can potentially do a better job of balancing load between disks because it balances load at a lower level. The primary downside of RAID is that it is usually a big performance hit for write throughput, and it reduces the available disk space.
Another potential benefit of RAID is the ability to tolerate disk failures. However, rebuilding the RAID array is so I/O intensive that it can effectively disable the server, so this does not provide much improvement in availability.
Installing or Upgrading Kafka from a Parcel
Minimum Required Role: Cluster Administrator (also provided by Full Administrator)
- In Cloudera Manager, select .
- If you do not see Kafka in the list of parcels, you can add the parcel to the list.
- Find the parcel for the version of Kafka you want to use on Cloudera Distribution of Apache Kafka Versions.
- Copy the parcel repository link.
- On the Cloudera Manager Parcels page, click Configuration.
- In the field Remote Parcel Repository URLs, click + next to an existing parcel URL to add a new field.
- Paste the parcel repository link.
- Save your changes.
- On the Cloudera Manager Parcels page, download the Kafka parcel, distribute the parcel to the hosts in your cluster, and then activate the parcel. See Managing Parcels. After you activate the Kafka parcel, Cloudera Manager prompts you to restart the cluster. You do not need to restart the cluster after installing Kafka. Click Close to ignore this prompt.
- Add the Kafka service to your cluster. See Adding a Service.
Installing or Upgrading Kafka from a Package
Minimum Required Role: Cluster Administrator (also provided by Full Administrator)
You install the Kafka package from the command line.
- Navigate to the /etc/repos.d directory.
- Use wget to download the Kafka repository. See CDK Powered By Apache Kafka® Version and Packaging Information.
- Install Kafka using the appropriate commands for your operating system.
Kafka Installation Commands Operating System Commands RHEL-compatible $ sudo yum clean all $ sudo yum install kafka $ sudo yum install kafka-server
SLES $ sudo zypper clean --all $ sudo zypper install kafka $ sudo zypper install kafka-server
Ubuntu or Debian $ sudo apt-get update $ sudo apt-get install kafka $ sudo apt-get install kafka-server
- Edit /etc/kafka/conf/server.properties to ensure that the broker.id is unique for each node and broker in Kafka cluster, and zookeeper.connect points to same ZooKeeper for all nodes and brokers.
- Start the Kafka server with the following command:
$ sudo service kafka-server start.
To verify all nodes are correctly registered to the same ZooKeeper, connect to ZooKeeper using zookeeper-client.
$ zookeeper-client $ ls /brokers/ids
You should see all of the IDs for the brokers you have registered in your Kafka cluster.
To discover to which node a particular ID is assigned, use the following command:
$ get /brokers/ids/<ID>
This command returns the host name of node assigned the ID you specify.