Administering an HDFS High Availability Cluster
Manually Failing Over to the Standby NameNode
Manually Failing Over to the Standby NameNode Using Cloudera Manager
If you are running an HDFS service with HA enabled, you can manually cause the active NameNode to failover to the standby NameNode. This is useful for planned downtime—for hardware changes, configuration changes, or software upgrades of your primary host.
- Go to the HDFS service.
- Click the Instances tab.
- Select . (This option does not appear if HA is not enabled for the cluster.)
- From the pop-up, select the NameNode that should be made active, then click Manual Failover.
- When all the steps have been completed, click Finish.
Cloudera Manager transitions the NameNode you selected to be the active NameNode, and the other NameNode to be the standby NameNode. HDFS should never have two active NameNodes.
Manually Failing Over to the Standby NameNode Using the Command Line
To initiate a failover between two NameNodes, run the command hdfs haadmin -failover.
Moving an HA NameNode to a New Host
Moving an HA NameNode to a New Host Using Cloudera Manager
Moving an HA NameNode to a New Host Using the Command Line
Use the following steps to move one of the NameNodes to a new host.
- nn2-alt is already a member of this CDH 5 HA cluster.
- Automatic failover is configured.
- A JournalNode on nn2, in addition to the NameNode service, is to be moved to nn2-alt.
The procedure moves the NameNode and JournalNode services from nn2 to nn2-alt, reconfigures nn1 to recognize the new location of the JournalNode, and restarts nn1 and nn2-alt in the new HA configuration.
Step 1: Make sure that nn1 is the active NameNode
Make sure that the NameNode that is not going to be moved is active; in this example, nn1 must be active. You can use the NameNodes' web UIs to see which is active; see Start the NameNodes.
hdfs haadmin -failover nn2 nn1
Step 2: Stop services on nn2
- Stop the NameNode daemon:
$ sudo service hadoop-hdfs-namenode stop
- Stop the ZKFC daemon if it is running:
$ sudo service hadoop-hdfs-zkfc stop
- Stop the JournalNode daemon if it is running:
$ sudo service hadoop-hdfs-journalnode stop
- Make sure these services are not set to restart on boot. If you are not planning to use nn2 as a NameNode again, you may want to remove the services.
Step 3: Install the NameNode daemon on nn2-alt
See the instructions for installing hadoop-hdfs-namenode under Step 3: Install CDH 5 with YARN or Step 4: Install CDH 5 with MRv1.
Step 4: Configure HA on nn2-alt
See Enabling HDFS HA for the properties to configure on nn2-alt in core-site.xml and hdfs-site.xml, and explanations and instructions. You should copy the values that are already set in the corresponding files on nn2.- If you are relocating a JournalNode to nn2-alt, follow these directions to install it, but do not start it yet.
- If you are using automatic failover, make sure you follow the instructions for configuring the required properties on nn2-alt and initializing the HA state in ZooKeeper.
Step 5: Copy the contents of the dfs.name.dir and dfs.journalnode.edits.dir directories to nn2-alt
Use rsync or a similar tool to copy the contents of the dfs.name.dir directory, and the dfs.journalnode.edits.dir directory if you are moving the JournalNode, from nn2 to nn2-alt.
Step 6: If you are moving a JournalNode, update dfs.namenode.shared.edits.dir on nn1
If you are relocating a JournalNode from nn2 to nn2-alt, update dfs.namenode.shared.edits.dir in hdfs-site.xml on nn1 to reflect the new hostname. See this section for more information about dfs.namenode.shared.edits.dir.
Step 7: If you are using automatic failover, install the zkfc daemon on nn2-alt
For instructions, see Deploy Automatic Failover (if it is configured), but do not start the daemon yet.
Step 8: Start services on nn2-alt
Start the NameNode; start the ZKFC for automatic failover; and install and start a JournalNode if you want one to run on nn2-alt:
- Start the JournalNode daemon:
$ sudo service hadoop-hdfs-journalnode start
- Start the NameNode daemon:
$ sudo service hadoop-hdfs-namenode start
- Start the ZKFC daemon:
$ sudo service hadoop-hdfs-zkfc start
- Set these services to restart on boot; for example on a RHEL-compatible system:
$ sudo chkconfig hadoop-hdfs-namenode on $ sudo chkconfig hadoop-hdfs-zkfc on $ sudo chkconfig hadoop-hdfs-journalnode on
Other HDFS haadmin Commands
After your HA NameNodes are configured and started, you have access to additional commands to administer your HA HDFS cluster. Specifically, you should familiarize yourself with the subcommands of the hdfs haadmin command.
This page describes high-level uses of some important subcommands. For specific usage information of each subcommand, run hdfs haadmin -help <command>.
getServiceState
getServiceState - Determine whether the given NameNode is active or standby.
Connect to the provided NameNode to determine its current state, printing either "standby" or "active" to STDOUT as appropriate. This subcommand might be used by cron jobs or monitoring scripts, which need to behave differently based on whether the NameNode is currently active or standby.
checkHealth
checkHealth - Check the health of the given NameNode.
Connect to the provided NameNode to check its health. The NameNode can perform some diagnostics on itself, including checking if internal services are running as expected. This command returns 0 if the NameNode is healthy, non-zero otherwise. You can use this command for monitoring purposes.
Using the dfsadmin Command When HA Is Enabled
By default, applicable dfsadmin command options are run against both active and standby NameNodes. To limit an option to a specific NameNode, use the -fs option. For example,
To turn safe mode on for both NameNodes, run:
hdfs dfsadmin -safemode enter
To turn safe mode on for a single NameNode, run:
hdfs dfsadmin -fs hdfs://<host>:<port> -safemode enter
For a full list of dfsadmin command options, run: hdfs dfsadmin -help.
Converting From an NFS-mounted Shared Edits Directory to Quorum-based Storage
Converting From an NFS-mounted Shared Edits Directory to Quorum-based Storage Using Cloudera Manager
Converting a HA configuration from using an NFS-mounted shared edits directory to Quorum-based storage involves disabling the current HA configuration then enabling HA using Quorum-based storage.
- Disable HA.
- Although the standby NameNode role is removed, its name directories are not deleted. Empty these directories.
- Enable HA with Quorum-based storage.