Handling disk failures
An overview on how to handle disk failures.
Cloudera Manager has built in monitoring functionalities that automatically trigger alerts
when disk failures are detected. When a log directory fails, Kafka also detects the failure
and takes the partitions stored in that directory offline. The
cause of disk failures can be analyzed with the help of the
kafka-log-dirs
tool, or by reviewing the error messages of KafkaStorageException
entries in
the Kafka broker log file. To access the log file go to
.In case of a disk failure, a Kafka administrator can carry out either of the following
actions. The action taken depends on the failure type and system environment:
- Replace the faulty disk with a new one.
- Remove the disk and redistribute data across remaining disks to restore the desired replication factor.