While operating an HA cluster, the Active NameNode cannot commit a transaction if it cannot write successfully to a quorum of the JournalNodes.
When restarting an HA cluster, the steps for initializing JournalNodes and NN2 can be skipped.
Start the services in the following order:
JournalNodes
NameNodes
Note Verify that the ZKFailoverController (ZKFC) process on each node is running so that one of the NameNodes can be converted to active state.
DataNodes
In a NameNode HA cluster, the following
dfs admin
command options will run only on the active NameNode:-rollEdits -setQuota -clrQuota -setSpaceQuota -clrSpaceQuota -setStoragePolicy -getStoragePolicy -finalizeUpgrade -rollingUpgrade -printTopology -allowSnapshot <snapshotDir> -disallowSnapshot <snapshotDir>
The following
dfs admin
command options will run on both the active and standby NameNodes:-safemode enter -saveNamespace -restoreFailedStorage -refreshNodes -refreshServiceAcl -refreshUserToGroupsMappings -refreshSuperUserGroupsConfiguration -refreshCallQueue -metasave -setBalancerBandwidth
The
-refresh <host:ipc_port> <key> arg1..argn
command will be sent to the corresponding host according to its command arguments.The
-fetchImage <local directory>
command attempts to identify the active NameNode through an RPC call, and then fetch thefsimage
from that NameNode. This means that usually thefsimage
is retrieved from the active NameNode, but it is not guaranteed because a failover can happen between the two operations.The following
dfs admin
command options are sent to the DataNodes:-refreshNamenodes -deleteBlockPool -shutdownDatanode <datanode_host:ipc_port> upgrade -getDatanodeInfo <datanode_host:ipc_port>