4. Operating a NameNode HA Cluster

  • While operating an HA cluster, the Active NameNode cannot commit a transaction if it cannot write successfully to a quorum of the JournalNodes.

  • When restarting an HA cluster, the steps for initializing JournalNodes and NN2 can be skipped.

  • Start the services in the following order:

    1. JournalNodes

    2. NameNodes

      [Note]Note

      Verify that the ZKFailoverController (ZKFC) process on each node is running so that one of the NameNodes can be converted to active state.

    3. DataNodes

  • In a NameNode HA cluster, the following dfs admin command options will run only on the active NameNode:

    -rollEdits
    -setQuota
    -clrQuota
    -setSpaceQuota
    -clrSpaceQuota
    -setStoragePolicy
    -getStoragePolicy
    -finalizeUpgrade
    -rollingUpgrade
    -printTopology
    -allowSnapshot <snapshotDir> 
    -disallowSnapshot <snapshotDir>

    The following dfs admin command options will run on both the active and standby NameNodes:

    -safemode enter
    -saveNamespace
    -restoreFailedStorage
    -refreshNodes
    -refreshServiceAcl
    -refreshUserToGroupsMappings
    -refreshSuperUserGroupsConfiguration
    -refreshCallQueue
    -metasave
    -setBalancerBandwidth

    The -refresh <host:ipc_port> <key> arg1..argn command will be sent to the corresponding host according to its command arguments.

    The -fetchImage <local directory> command attempts to identify the active NameNode through an RPC call, and then fetch the fsimage from that NameNode. This means that usually the fsimage is retrieved from the active NameNode, but it is not guaranteed because a failover can happen between the two operations.

    The following dfs admin command options are sent to the DataNodes:

    -refreshNamenodes
    -deleteBlockPool
    -shutdownDatanode <datanode_host:ipc_port> upgrade
    -getDatanodeInfo <datanode_host:ipc_port>