How Cruise Control self-healing works
The Anomaly detector is responsible for the self-healing feature of Cruise Control. When self-healing is enabled in Cruise Control, the detected anomalies can trigger the attempt to automatically fix certain types of failure such as broker failure, disk failure, goal violations and other anomalies.
- Broker failures
- The anomaly is detected when a non-empty broker crashes or when a broker is removed from a cluster. This results in offline replicas and under-replicated partitions.
- Goal violations
- The anomaly is detected when an optimization goal is violated.
- Disk failure
- The anomaly is detected when one of the non-empty disks dies.
- Slow broker
- The anomaly is detected when based on the configured thresholds, a broker is identified as a slow broker.
- Topic replication factor
- The anomaly is detected when a topic partition does not have the desired replication factor.