5.1.1. Blocks health

This service-level alert is triggered if the number of corrupt or missing blocks exceeds the configured critical threshold. This alert uses the check_hdfs_blocks plugin. Potential causes
  • Some DataNodes are down and the replicas that are missing blocks are only on those DataNodes

  • The corrupt/missing blocks are from files with a replication factor of 1. New replicas cannot be created because the only replica of the block is missing Possible remedies
  • For critical data, use a replication factor of 3

  • Bring up the failed DataNodes with missing or corrupt blocks.

  • Identify the files associated with the missing or corrupt blocks by running the Hadoop fsck command

  • Delete the corrupt files and recover them from backup, if it exists

