Operating a NameNode HA cluster
The dfsadmin
command can be run on both active and standby NameNodes
to operate the HA cluster.
- While operating an HA cluster, the Active NameNode cannot commit a transaction if it cannot write successfully to a quorum of the JournalNodes.
- When restarting an HA cluster, the steps for initializing JournalNodes and NN2 can be skipped.
- Start the services in the following order:
- JournalNodes
- NameNodes Note
Verify that the ZKFailoverController (ZKFC) process on each node is running so that one of the NameNodes can be converted to active state.
- DataNodes
- In a NameNode HA cluster, the following
dfsadmin
command options will run only on the active NameNode:-rollEdits -setQuota -clrQuota -setSpaceQuota -clrSpaceQuota -setStoragePolicy -getStoragePolicy -finalizeUpgrade -rollingUpgrade -printTopology -allowSnapshot <snapshotDir> -disallowSnapshot <snapshotDir>
The following
dfsadmin
command options will run on both the active and standby NameNodes:-safemode enter -saveNamespace -restoreFailedStorage -refreshNodes -refreshServiceAcl -refreshUserToGroupsMappings -refreshSuperUserGroupsConfiguration -refreshCallQueue -metasave -setBalancerBandwidth
The
-refresh <host:ipc_port> <key> arg1..argn
command will be sent to the corresponding host according to its command arguments.The
-fetchImage <local directory>
command attempts to identify the active NameNode through a RPC call, and then fetch the fsimage from that NameNode. This means that usually the fsimage is retrieved from the active NameNode, but it is not guaranteed because a failover can happen between the two operations.The following
dfsadmin
command options are sent to the DataNodes:-refreshNamenodes -deleteBlockPool -shutdownDatanode <datanode_host:ipc_port> upgrade -getDatanodeInfo <datanode_host:ipc_port>