Operating a NameNode HA cluster

« Prev

While operating an HA cluster, the Active NameNode cannot commit a transaction if it cannot write successfully to a quorum of the JournalNodes.
When restarting an HA cluster, the steps for initializing JournalNodes and NN2 can be skipped.
Start the services in the following order:
1. JournalNodes
2. NameNodes
  Note
  Verify that the ZKFailoverController (ZKFC) process on each node is running so that one of the NameNodes can be converted to active state.
3. DataNodes
In a NameNode HA cluster, the following dfs admin command options will run only on the active NameNode:
```
-rollEdits
-setQuota
-clrQuota
-setSpaceQuota
-clrSpaceQuota
-setStoragePolicy
-getStoragePolicy
-finalizeUpgrade
-rollingUpgrade
-printTopology
-allowSnapshot <snapshotDir> 
-disallowSnapshot <snapshotDir>
```
The following dfs admin command options will run on both the active and standby NameNodes:
```
-safemode enter
-saveNamespace
-restoreFailedStorage
-refreshNodes
-refreshServiceAcl
-refreshUserToGroupsMappings
-refreshSuperUserGroupsConfiguration
-refreshCallQueue
-metasave
-setBalancerBandwidth
```
The -refresh <host:ipc_port> <key> arg1..argn command will be sent to the corresponding host according to its command arguments.
The -fetchImage <local directory> command attempts to identify the active NameNode through a RPC call, and then fetch the fsimage from that NameNode. This means that usually the fsimage is retrieved from the active NameNode, but it is not guaranteed because a failover can happen between the two operations.
The following dfs admin command options are sent to the DataNodes:
```
-refreshNamenodes
-deleteBlockPool
-shutdownDatanode <datanode_host:ipc_port> upgrade
-getDatanodeInfo <datanode_host:ipc_port>
```

	Note
	Verify that the ZKFailoverController (ZKFC) process on each node is running so that one of the NameNodes can be converted to active state.

​Operating a NameNode HA cluster