Handling disk failures
An overview on how to handle disk failures.
Cloudera Manager has built in monitoring functionalities that automatically trigger alerts when disk failures are detected. When a log directory fails, Kafka also detects the failure and takes the partitions stored in that directory offline. The cause of disk failures can be analyzed with the help of the
kafka-log-dirstool, or by reviewing the error messages of
KafkaStorageExceptionentries in the Kafka broker log file. To access the log file go to .
In case of a disk failure, a Kafka administrator can carry out either of the following actions. The action taken depends on the failure type and system environment:
- Replace the faulty disk with a new one.
- Remove the disk and redistribute data across remaining disks to restore the desired replication factor.