Configuring Fault Tolerance
Also available as:
PDF
loading table of contents...

Prepare to Back Up the HDFS Metadata

Regardless of the solution, a full, up-to-date continuous backup of the namespace is not possible. Some of the most recent data is always lost. HDFS is not an Online Transaction Processing (OLTP) system. Most data can be easily recreated if you re-run Extract, Transform, Load (ETL) or processing jobs.

  • Normal NameNode failures are handled by the Standby NameNode. Doing so creates a safety-net for the very unlikely case where both master NameNodes fail.

  • In the case of both NameNode failures, you can start the NameNode service with the most recent image of the namespace.

  • Name Nodes maintain the namespace as follows:

    • Standby NameNodes keep a namespace image in memory based on edits available in a storage ensemble in Journal Nodes.

    • Standby NameNodes make a namespace checkpoint and saves an fsimage_* to disk.

    • Standby NameNodes transfer the fsimage to the primary NameNodes using HTTP.

Both NameNodes write fsimages to disk in the following sequence:

  • NameNodes write the namespace to a file fsimage.ckpt_* on disk.

  • NameNodes creates an fsimage_*.md5 file.

  • NameNodes moves the file fsimage.ckpt_* to fsimage_.*.

The process by which both NameNodes write fsimages to disk ensures that:

  • The most recent namespace image on disk in an fsimage_* file is on the standby NameNode.

  • Any fsimage_* file on disk is finalized and does not receive updates.