Configuring Fault Tolerance
Also available as:
loading table of contents...

Operating a NameNode HA cluster

The dfsadmin command can be run on both active and standby NameNodes to operate the HA cluster.

  • While operating an HA cluster, the Active NameNode cannot commit a transaction if it cannot write successfully to a quorum of the JournalNodes.
  • When restarting an HA cluster, the steps for initializing JournalNodes and NN2 can be skipped.
  • Start the services in the following order:
    1. JournalNodes
    2. NameNodes

      Verify that the ZKFailoverController (ZKFC) process on each node is running so that one of the NameNodes can be converted to active state.

    3. DataNodes
  • In a NameNode HA cluster, the following dfsadmin command options will run only on the active NameNode:
    -allowSnapshot <snapshotDir> 
    -disallowSnapshot <snapshotDir>

    The following dfsadmin command options will run on both the active and standby NameNodes:

    -safemode enter

    The -refresh <host:ipc_port> <key> arg1..argn command will be sent to the corresponding host according to its command arguments.

    The -fetchImage <local directory> command attempts to identify the active NameNode through a RPC call, and then fetch the fsimage from that NameNode. This means that usually the fsimage is retrieved from the active NameNode, but it is not guaranteed because a failover can happen between the two operations.

    The following dfsadmin command options are sent to the DataNodes:

    -shutdownDatanode <datanode_host:ipc_port> upgrade
    -getDatanodeInfo <datanode_host:ipc_port>