JBOD Operational Procedures

Monitoring

Cloudera recommends that administrators continuously monitor the following on a cluster:
Replication Status
Monitor replication status using Cloudera Manager Health Tests. Cloudera Manager automatically and continuously monitors both the OfflineLogDirectoryCount and OfflineReplicaCount metrics. Alters are raised when failures are detected. For more information, see Cloudera Manager Health Tests.
Disk Capacity
Monitor free space on mounted disks and open file descriptors. For more information, see Useful Shell Command Reference. Reassign partitions or move log files around if necessary. For more information, see kafka-reassign-partitions.

Handling Disk Failures

Cloudera Manager has built in monitoring functionalities that automatically trigger alerts when disk failures are detected. When a log directory fails, Kafka also detects the failure and takes the partitions stored in that directory offline. The cause of disk failures can be analyzed with the help of the kafka-log-dirs tool, or by reviewing the error messages of KafkaStorageException entries in the Kafka broker log file.
To view the Kafka broker log file, complete the following steps:
  1. In Cloudera Manager go to the Kafka service, select Instances and select the broker.
  2. Go to Log Files > Role Log File.
In case of a disk failure, a Kafka administrator can carry out either of the following actions. The action taken depends on the failure type and system environment:
  • Replace the faulty disk with a new one.
  • Remove the disk and redistribute data across remaining disks to restore the desired replication factor.

Disk Replacement

To replace a disk, complete the following steps:
  1. Stop the broker that has a faulty disk.
    1. In Cloudera Manager, go to the Kafka service, select Instances and select the broker.
    2. Go to Actions > Gracefully stop this Kafka Broker.
  2. Replace the disk.
  3. Mount the disk.
  4. Set up the directory structure on the new disk the same way as it was set up on the previous disk.
  5. Start the broker.
    1. In Cloudera Manager go to the Kafka service, selectInstances and select the broker.
    2. Go to Actions > Start this Kafka Broker.

    The Kafka broker re-creates topic partitions in the same directory by replicating data from other brokers.

Disk Removal

To remove a disk from the configuration, complete the following steps:
  1. Stop the broker that has a faulty disk.
    1. In Cloudera Manager, go to the Kafka service, select Instances and select the broker.
    2. Go to Actions > Gracefully stop this Kafka Broker.
  2. Remove the log directories on the faulty disk from the broker.
    1. Go to Configuration and find the Data Directories property.
    2. Remove the affected log directories with the Remove button.
    3. Enter a Reason for change, and then click Save Changes to commit the changes.
  3. Start the broker.
    1. In Cloudera Manager go to the Kafka service, selectInstances and select the broker.
    2. Go to Actions > Start this Kafka Broker.

    The Kafka broker redistributes data across the cluster.

Reassigning Replicas Between Log Directories

Reassigning replicas between log directories can prove useful when you have multiple disks available, but one or more of them is nearing capacity. Moving a replica from one disk to another ensures that the service will not go down due to disks reaching capacity. To balance storage loads, the Kafka administrator has to continuously monitor the system and reassign replicas between log directories on the same broker or across different brokers. These actions can be carried out with the kafka-reassign-partitions tool.

For more information on tool usage, see the documentation for the kafka-reassign-partitions tool.

Retrieving Log Directory Replica Assignment Information

To optimize replica assignment across log directories, the list of partitions per log directory and the size of each partition is required. This information can be exposed with the kafka-log-dirs tool.

For more information on tool usage, see the documentation for the kafka-log-dirs tool.