Hadoop High Availability
Also available as:
PDF

Operating a NameNode HA cluster

  • While operating an HA cluster, the Active NameNode cannot commit a transaction if it cannot write successfully to a quorum of the JournalNodes.

  • When restarting an HA cluster, the steps for initializing JournalNodes and NN2 can be skipped.

  • Start the services in the following order:

    1. JournalNodes

    2. NameNodes

      [Note]Note

      Verify that the ZKFailoverController (ZKFC) process on each node is running so that one of the NameNodes can be converted to active state.

    3. DataNodes

  • In a NameNode HA cluster, the following dfs admin command options will run only on the active NameNode:

    -rollEdits
    -setQuota
    -clrQuota
    -setSpaceQuota
    -clrSpaceQuota
    -setStoragePolicy
    -getStoragePolicy
    -finalizeUpgrade
    -rollingUpgrade
    -printTopology
    -allowSnapshot <snapshotDir> 
    -disallowSnapshot <snapshotDir>

    The following dfs admin command options will run on both the active and standby NameNodes:

    -safemode enter
    -saveNamespace
    -restoreFailedStorage
    -refreshNodes
    -refreshServiceAcl
    -refreshUserToGroupsMappings
    -refreshSuperUserGroupsConfiguration
    -refreshCallQueue
    -metasave
    -setBalancerBandwidth

    The -refresh <host:ipc_port> <key> arg1..argn command will be sent to the corresponding host according to its command arguments.

    The -fetchImage <local directory> command attempts to identify the active NameNode through a RPC call, and then fetch the fsimage from that NameNode. This means that usually the fsimage is retrieved from the active NameNode, but it is not guaranteed because a failover can happen between the two operations.

    The following dfs admin command options are sent to the DataNodes:

    -refreshNamenodes
    -deleteBlockPool
    -shutdownDatanode <datanode_host:ipc_port> upgrade
    -getDatanodeInfo <datanode_host:ipc_port>