Setting capacity estimations and goals
Cruise Control rebalancing works using capacity estimations and goals. You need to configure the capacity estimates based on your resources, and set the goals for Cruise Control to achieve the Kafka partition rebalancing that meets your requirements.
When configuring Cruise Control, you need to make sure that the Kafka topics and partitions, the capacity estimates, and the proper goals are provided so the rebalancing process works as expected.
- Go to your cluster in Cloudera Manager.
- Select Cloudera Manager from the services.
- Select Cruise Control from the list of Services.
- Click Configuration.
- Select Main from the Filters.
Configuring capacity estimations
The values for capacity estimation needs to be provided based on your available resources for CPU and network. Beside the capacity estimation, you also need to provide information about the broker and partition metrics. You can set the capacity estimations and Kafka properties in Cloudera Manager.
Capacity | Description |
---|---|
capacity.default.cpu |
100 by default |
capacity.default.network-in |
Given by the internet provider |
capacity.default.network-out |
The optimizers in Cruise Control use the network incoming and outgoing capacities to define a
boundary for optimization. The capacity estimates are generated and read by Cruise Control. A
capacity.json
file is generated when Cruise Control is started. When a new
broker is added, Cruise Control uses the default broker capacity values. However, in case disk
related goals are used, Cruise Control must be restarted to load the actual disk capacity metrics
of the new broker.
The following table lists all the configurations that are needed to configure Cruise Control specifically to your environment:
Configuration | Description |
---|---|
num.metric.fetchers |
Parallel threads for fetching metrics from the Cloudera Manager database |
partition.metric.sample.store.topic |
Storing Cruise Control metrics |
broker.metric.sample.store.topic |
Storing Cruise Control metircs |
partition.metrics.window.ms |
Time window size for partition metrics |
broker.metrics.window.ms |
Time window size for broker metrics |
num.partition.metrics.windows |
Number of stored partition windows |
num.broker.metrics.windows |
Number of stored broker windows |
Configuring goals
After setting the capacity estimates, you can specify which goals need to be used for the rebalancing process in Cloudera Manager. The provided goals are used for the optimization proposal of your Kafka cluster.
Example of Cruise Control goal configuration
By default, Cruise Control is configured with a set of Default, Supported, Hard, Self-healing and Anomaly detection goals in Cloudera Manager. The default configurations can be changed based on what you would like to achieve with the rebalancing.
- Find dead/failed brokers and create an anomaly to remove load from them
(
self.healing.broker.failure.enabled
) - Move load back to the brokers when the brokers are available again
(
self.healing.goal.violation.enabled
and added goals) - Prevent too frequent rebalances to reduce cluster costs (incremented thresholds, reduced
self.healing.goals
set) - Have an always balanced cluster from the replicas and leader replicas point of view
- Not enable every type of self-healing methods if it is not required (only two type of self-healing is enabled)
self.healing.goal.violation.enabled=true
self.healing.broker.failure.enabled=true
self.healing.exclude.recently.removed.brokers=false
anomaly.notifier.class=com.linkedin.kafka.cruisecontrol.detector.notifier.SelfHealingNotifier
replica.count.balance.threshold=1.25
leader.replica.count.balance.threshold=1.25
com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal
com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal
com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal
com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal
com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal
com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal
Other configurations can remain as set by default.
Multi-level rack-aware distribution goal
You can use the MultiLevelRackAwareDistributionGoal to ensure rack awareness on a higher level than for the standard rack aware goal for Kafka clusters using Cruise Control.
The MultiLevelRackAwareDistributionGoal
behaves differently than the default
RackAwareGoal
or RackAwareDistributionGoal
in Cruise Control.
The standard goals have lighter requirements on rack awareness, and always optimize based on the
current state of the cluster and with the priority on making all replicas come back online.
This means that in case a network partition failure occurs, and a data center goes offline, a Cruise Control rebalance operation using a standard rack-aware goal ignores the data center that is not working, and moves replicas around as if there were one fewer data center in the cluster. For example, if a Kafka cluster has three data centers and one goes offline, the standard goals are not aware of the existence of the third data center, and act as if only two data centers are used in the cluster.
MultiLevelRackAwareDistributionGoal
acts differently in the following
aspects:- Handles rack IDs as multi-level rack IDs, respecting the hierarchy of racks when distributing replicas
- Keeps track of the whole state of the cluster with caching previous states to make sure that all racks are visible
- Prioritizes multi-level rack awareness guarantees over bringing all replicas back online
In the same failure situation, where one data center is offline out of three, the multi-level
rack-aware goal is still aware of the existence of the third data center. This means that the
offline replicas are not moved from the third data center if the migration violates the
multi-level rack awareness guarantees. The goal allows optimizations to pass even in the presence
of offline replicas, which can be configured with
cloudera.multi.level.rack.awareness.ensure.no.offline.replicas
property. If the
cloudera.multi.level.rack.awareness.ensure.no.offline.replicas
is set to
true
, the goal causes the rebalance operation to fail if the replicas would
stay offline after the optimizations are implemented.
kafka_assigner
parameter is set to true in the corresponding request
(for example, with the rebalance request as shown in the Cruise Control documentation).