Scaling KRaft controllers

Scale the number of KRaft controllers in a Cloudera Base on premises cluster by adding or removing KRaft Controller role instances.

The number of KRaft controllers provisioned in a Cloudera Base on premises cluster can be scaled. Scaling is done by adding or deleting KRaft Controller role instances in the cluster and then executing the Add KRaft Controller to Quorum or Remove KRaft Controller from Quorum actions. Kafka in Cloudera Base on premises deployments use Dynamic KRaft Quorums.

Scaling up KRaft controllers

Scale KRaft controllers up by adding new KRaft Controller role instances to your Kafka service and adding the controllers to the KRaft Quorum using the Add KRaft Controller to Quorum action.

  • Ensure that the cluster, its hosts, and all its services are healthy.

  • Ensure that the majority of the KRaft Quorum is healthy and operational.

  • Cloudera recommends deploying KRaft Controller roles on hosts that do not have Kafka Broker roles. This avoids both controllers and brokers going down during a host failure. Controllers and brokers also have different system requirements. For more information on how to add a new host, see Adding a Host to a Cluster.

  • To withstand N concurrent failures, you must scale to a total of 2N + 1 controllers to ensure the KRaft quorum remains functional. For example, a cluster of five controllers (2x2 + 1) can tolerate up to two concurrent failures without impacting availability.

  • If you are scaling from a single controller, you can scale to three or more, but will not be able to scale down to a single controller.

  1. In Cloudera Manager, select the Kafka service.
  2. Go to Instances.
  3. Click Add Role Instances.
  4. Click Select hosts found under KRaft Controller.
  5. In the host selection dialog, select one or more hosts and click OK.
    You can select the checkbox next to the Hostname column header to select all available hosts.
  6. Click Continue.
  7. Optional: Review and configure the properties available on the Review Changes page based on your cluster and requirements.
  8. Click Finish.
  9. Select the newly added role instances.
  10. Click Actions for Selected > Start.
  11. Review the list of instances that will be started and click Start.
  12. For each newly added KRaft Controller role, run the Add KRaft Controller to Quorum role level action.
    1. Select a newly added KRaft Controller role instance.
    2. Click Actions > Add KRaft Controller to Quorum.
    3. Wait until the command completes and click Close.
    4. Repeat these steps for each newly added KRaft Controller role.
  13. Restart the Kafka service to clear configuration staleness warnings.
The newly added KRaft Controller roles are part of the KRaft Quorum and the Kafka service reflects the updated configuration.

Scaling down KRaft controllers

Scale KRaft controllers down by removing KRaft Controller role instances from your Kafka service and then removing them from the KRaft Quorum using the Remove KRaft Controller from Quorum action. The role instances can be deleted after the downscale process is finished.

  • Ensure that the cluster, its hosts, and all its services are healthy.

  • Ensure that the majority of the KRaft Quorum is healthy and operational.

  • When scaling down, you can remove a maximum of N controllers if you have a total of 2N + 1 roles to ensure you preserve a majority of the quorum. For example, you can reduce a cluster from five controllers to three by removing two roles, but you cannot reduce a three controller cluster further as it would violate the minimum supported count.

  • You must always scale to an odd number of controllers and maintain a minimum of three roles, as scaling to fewer than three controllers is not supported.

  1. In Cloudera Manager, select the Kafka service.
  2. Go to Instances.
  3. Select the KRaft Controller roles you want to remove.
  4. Click Actions for Selected > Stop.
  5. Review the list of instances that will be stopped and click Stop.
  6. Wait until the stop process is finished and click Close.
  7. For each stopped KRaft Controller role, execute the Remove KRaft Controller from Quorum role level action.
    1. Select a stopped KRaft Controller role instance.
    2. Click Actions for Selected > Remove KRaft Controller from Quorum.
    3. Wait until the command completes and click Close.
    4. Repeat these steps for each KRaft Controller role you want to remove.
  8. Restart the Kafka service to clear configuration staleness warnings.
    While the remaining controllers function correctly without this restart, Cloudera Manager displays configuration staleness warnings until the service is restarted. Restarting the service clears these warnings and completes the scaling operation.
The KRaft Controller roles are no longer part of the KRaft Quorum and the Kafka service reflects the updated configuration.
After the controller is removed from the quorum, you can fully remove the role instances from the cluster by deleting them.
  1. Go to Kafka > Instances.
  2. Select the instances that you want to delete.
  3. Click Actions for Selected > Delete.

Troubleshooting KRaft controller scaling

Troubleshoot issues when scaling KRaft controllers by checking quorum state and verifying that controllers have joined the quorum.

Checking KRaft Quorum state

If any of the add or remove commands are failing, the KRaft Quorum state can be checked to start the investigation using the following command:

kafka-metadata-quorum --bootstrap-controller example.com:9192 describe --status

Verifying that a controller joined the quorum

There are two ways to verify that a KRaft Controller successfully joined the quorum:

  • Command success: If the Add KRaft Controller to Quorum command completes successfully, the controller has been added to the quorum.
  • Health test: Cloudera Manager provides a health test that displays Concerning Health status if a KRaft Controller role is not part of the quorum. If the controller role shows healthy status, it is part of the quorum.

Handling dangling KRaft Controller roles

In some cases it is possible that a given KRaft Controller role is not actually a member of the KRaft Quorum despite never explicitly leaving it. For example, this can be the case when the metadata log directory was corrupted, removed, or recreated.

If there are dangling KRaft Controller roles present, meaning that a KRaft Controller role is active but not part of the Quorum, Cloudera Manager will change the health status of the role to Concerning Health.

To resolve the issue, the KRaft Controller role must be readded to the Quorum by completing a Remove KRaft Controller from QuorumAdd KRaft Controller to Quorum cycle.