Scaling Namespaces and Optimizing Data Storage
Also available as:
PDF
loading table of contents...

How the DataNode recovers failed erasure-coded blocks

The NameNode is responsible for tracking any missing blocks in an EC stripe. The NameNode assigns the task of recovering the blocks to the DataNodes. When a client requests for data and a block is missing, additional read requests are issued to fetch the parity blocks and decode the data.

The recovery task is passed as a heartbeat response. This process is similar to how replicated blocks are recovered after failure. The recovery task consists of the following three phases:

  1. Reading the data from source nodes: Input data is read in parallel from the source nodes. Based on the EC policy, it schedules the read requests to all source targets and reads only the minimum number of input blocks for reconstruction.
  2. Decoding the data and generating output: New data and parity blocks are decoded from the input data. All missing data and parity blocks are decoded together.
  3. Transferring the generated data blocks to target nodes: After the completion of decoding, the recovered blocks are transferred to target DataNodes.