5. Configure and Deploy NameNode Automatic Failover

The preceding sections describe how to configure manual failover. In that mode, the system will not automatically trigger a failover from the active to the standby NameNode, even if the active node has failed. This section describes how to configure and deploy automatic failover.

Automatic failover adds the following components to an HDFS deployment:

  • ZooKeeper quorum

  • ZKFailoverController process (abbreviated as ZKFC).

The ZKFailoverController (ZKFC) is a ZooKeeper client that monitors and manages the state of the NameNode. Each of the servers that run the NameNode service also run a ZKFC. The ZKFC is responsible for:

  • Health monitoring: The ZKFC periodically pings its local NameNode with a health-check command.

  • ZooKeeper session management: When the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper support for ephemeral nodes. If the session expires, the lock node is automatically deleted.

  • ZooKeeper-based election: If the local NameNode is healthy and no other node currently holds the lock znode, the ZKFC attempts to acquire the lock. If the ZKFC succeeds, then it has won the election and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above. First, the previous active node is fenced if necessary and then the local NameNode transitions to active state.