While operating an HA cluster, the Active NameNode cannot commit a transaction if it cannot write successfully to a quorum of the JournalNodes.
When restarting an HA cluster, the steps for initializing JournalNodes and NN2 can be skipped.
Start the services in the following order:
JournalNodes
NameNodes
Note Verify that the ZKFailoverController (ZKFC) process on each node is running so that one of the NameNodes can be converted to active state.
DataNodes
In a NameNode HA cluster, the following
dfs admin
command options will run only on the active NameNode:-rollEdits -setQuota -clrQuota -setSpaceQuota -clrSpaceQuota -setStoragePolicy -getStoragePolicy -finalizeUpgrade -rollingUpgrade -printTopology -allowSnapshot <snapshotDir> -disallowSnapshot <snapshotDir>
The following
dfs admin
command options will run on both the active and standby NameNodes:-safemode enter -saveNamespace -restoreFailedStorage -refreshNodes -refreshServiceAcl -refreshUserToGroupsMappings -refreshSuperUserGroupsConfiguration -refreshCallQueue -metasave -setBalancerBandwidth
The
-refresh <host:ipc_port> <key> arg1..argn
command will be sent to the corresponding host according to its command arguments.The
-fetchImage <local directory>
command attempts to identify the active NameNode through a RPC call, and then fetch the fsimage from that NameNode. This means that usually the fsimage is retrieved from the active NameNode, but it is not guaranteed because a failover can happen between the two operations.The following
dfs admin
command options are sent to the DataNodes:-refreshNamenodes -deleteBlockPool -shutdownDatanode <datanode_host:ipc_port> upgrade -getDatanodeInfo <datanode_host:ipc_port>