Administering an HDFS High Availability Cluster

Manually Failing Over to the Standby NameNode

If you are running a HDFS service with HA enabled, you can manually cause the active NameNode to failover to the standby NameNode. This is useful for planned downtime—for hardware changes, configuration changes, or software upgrades of your primary host.

  1. Go to the HDFS service.
  2. Click the Instances tab.
  3. Click Federation and High Availability.
  4. Locate the row for the Nameservice where you want to fail over the NameNode.
  5. Select Actions > Manual Failover. (This option does not appear if HA is not enabled for the cluster.)
  6. From the pop-up, select the NameNode that should be made active, then click Manual Failover.
  7. When all the steps have been completed, click Finish.

Cloudera Manager transitions the NameNode you selected to be the active NameNode, and the other NameNode to be the standby NameNode. HDFS should never have two active NameNodes.

Moving an HA NameNode to a New Host

Other HDFS haadmin Commands

After your HA NameNodes are configured and started, you will have access to some additional commands to administer your HA HDFS cluster. Specifically, you should familiarize yourself with the subcommands of the hdfs haadmin command.

This page describes high-level uses of some important subcommands. For specific usage information of each subcommand, you should run hdfs haadmin -help <command>.

getServiceState

getServiceState - determine whether the given NameNode is active or standby

Connect to the provided NameNode to determine its current state, printing either "standby" or "active" to STDOUT as appropriate. This subcommand might be used by cron jobs or monitoring scripts which need to behave differently based on whether the NameNode is currently active or standby.

checkHealth

checkHealth - check the health of the given NameNode

Connect to the provided NameNode to check its health. The NameNode is capable of performing some diagnostics on itself, including checking if internal services are running as expected. This command will return 0 if the NameNode is healthy, non-zero otherwise. One might use this command for monitoring purposes.

Using the dfsadmin Command When HA is Enabled

By default, applicable dfsadmin command options are run against both active and standby NameNodes. To limit an option to a specific NameNode, use the -fs option. For example,

To turn safe mode on for both NameNodes, run:

hdfs dfsadmin -safemode enter

To turn safe mode on for a single NameNode, run:

hdfs dfsadmin -fs hdfs://<host>:<port> -safemode enter

For a full list of dfsadmin command options, run: hdfs dfsadmin -help.

Converting From an NFS-mounted Shared Edits Directory to Quorum-based Storage

Converting a HA configuration from using an NFS-mounted shared edits directory to Quorum-based storage involves disabling the current HA configuration then enabling HA using Quorum-based storage.

  1. Disable HA.
  2. Although the standby NameNode role is removed, its name directories are not deleted. Empty these directories.
  3. Enable HA with Quorum-based storage.