Introduction to HDFS High Availability
This section provides an overview of the HDFS high availability (HA) feature and how to configure and manage an HA HDFS cluster.
This document assumes that the reader has a general understanding of components and node types in an HDFS cluster. For details, see the Apache HDFS Architecture Guide.
Continue reading:
Background
In a standard configuration, the NameNode is a single point of failure (SPOF) in an HDFS cluster. Each cluster has a single NameNode, and if that machine or process became unavailable, the cluster as a whole is unavailable until the NameNode is either restarted or brought up on a new host. The Secondary NameNode does not provide failover capability.
- In the case of an unplanned event such as a host crash, the cluster is unavailable until an operator restarts the NameNode.
- Planned maintenance events such as software or hardware upgrades on the NameNode machine result in periods of cluster downtime.
HDFS HA addresses the above problems by providing the option of running two NameNodes in the same cluster, in an Active/Passive configuration. These are referred to as the Active NameNode and the Standby NameNode. Unlike the Secondary NameNode, the Standby NameNode is hot standby, allowing a fast failover to a new NameNode in the case that a machine crashes, or a graceful administrator-initiated failover for the purpose of planned maintenance. You cannot have more than two NameNodes.
Implementation
Cloudera Manager 5 and CDH 5 support Quorum-based Storage as the only HA implementation.
Quorum-based Storage
Quorum-based Storage refers to the HA implementation that uses a Quorum Journal Manager (QJM).
In order for the Standby NameNode to keep its state synchronized with the Active NameNode in this implementation, both nodes communicate with a group of separate daemons called JournalNodes. When any namespace modification is performed by the Active NameNode, it durably logs a record of the modification to a majority of these JournalNodes. The Standby NameNode is capable of reading the edits from the JournalNodes, and is constantly watching them for changes to the edit log. As the Standby Node sees the edits, it applies them to its own namespace. In the event of a failover, the Standby will ensure that it has read all of the edits from the JournalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.
In order to provide a fast failover, it is also necessary that the Standby NameNode has up-to-date information regarding the location of blocks in the cluster. In order to achieve this, the DataNodes are configured with the location of both NameNodes, and they send block location information and heartbeats to both.