HDFS High Availability
An HDFS High Availability (HA) cluster is configured with two NameNodes—an active NameNode and a standby NameNode. Only one NameNode can be active at any point in time. HDFS HA depends on maintaining a log of all namespace modifications in a location available to both NameNodes, so that in the event of a failure the standby NameNode has up-to-date information about the edits and location of blocks in the cluster. See CDH 4 High Availability Guide or CDH 5 High Availability Guide for a more detailed introduction to HA.
You can use Cloudera Manager to configure your CDH 4 or CDH 5 cluster for HDFS HA and automatic failover. In Cloudera Manager 5, HA is implemented using Quorum-based storage. Quorum-based storage relies upon a set of JournalNodes, each of which maintains a local edits directory that logs the modifications to the namespace metadata. Enabling HA enables automatic failover as part of the same command.
- If you have upgraded from Cloudera Manager 4, and have a CDH 4 cluster using HA with an NFS-mounted shared edits directory, your HA configuration will continue to work. However, you will see a validation warning recommending you switch to Quorum-based storage.
- If you are using NFS-mounted shared edits directories and you disable HA, you will not be able to re-enable HA in Cloudera Manager 5 using NFS-mounted shared directories. Instead, you should configure HA to use Quorum-based storage.
- If you have HA enabled using an NFS-mounted shared edits directory, you will be blocked from upgrading CDH 4 to CDH 5. You must disable HA in order to perform the upgrade. After the upgrade, you will not be able to use NFS-mounted shared edits directories for edit log storage.
- If you are using CDH 4.0.x: CDH 4.0 did not support Quorum-based storage. Therefore, if you were using a HA configuration and you disable it, you will not be able to enable it through Cloudera Manager 5, since Cloudera Manager 5 does not support NFS-mounted storage. It is recommended that you upgrade your CDH 4.0 deployment to a more recent version of CDH.
- Shuts down the HDFS service, and the services that depend on it—MapReduce, YARN, and HBase. Therefore, you should not do this while you have jobs running on your cluster.
- Causes the previous monitoring history to become unavailable.
- mapred.jobtracker.restart.recover: true
- mapred.job.tracker.persist.jobstatus.active: true
- mapred.ha.automatic-failover.enabled: true
- mapred.ha.fencing.methods: shell(/bin/true)
- Hardware Configuration for Quorum-based Storage
- Enabling High Availability and Automatic Failover with Quorum-based Storage
- High Availability for Hue, Hive, and Impala
- Manually Failing Over to the Standby NameNode
- Disabling High Availability
- Fencing Methods
- Converting From an NFS-mounted Shared Edits Directory to Quorum-based Storage
Hardware Configuration for Quorum-based Storage
- NameNode machines - the machines on which you run the Active and Standby NameNodes should have equivalent hardware to each other, and equivalent hardware to what would be used in a non-HA cluster.
- JournalNode machines - the machines on which you run the JournalNodes.
- The JournalNode daemon is relatively lightweight, so these daemons can reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager.
- Cloudera recommends that you deploy the JournalNode daemons on the "master" host or hosts (NameNode, Standby NameNode, JobTracker, etc.) so the JournalNodes' local directories can use the reliable local storage on those machines. You should not use SAN or NAS storage for these directories.
- There must be at least three JournalNode daemons, since edit log modifications must be written to a majority of JournalNodes. This will allow the system to tolerate the failure of a single machine. You can also run more than three JournalNodes, but in order to actually increase the number of failures the system can tolerate, you should run an odd number of JournalNodes, (three, five, seven, etc.) Note that when running with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to function normally. If the requisite quorum is not available, the NameNode will not format or start, and you will see an error similar to this:
12/10/01 17:34:18 WARN namenode.FSEditLog: Unable to determine input streams from QJM to [10.0.1.10:8485, 10.0.1.10:8486, 10.0.1.10:8487]. Skipping. java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
In an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error. If you are reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled, you can reuse the hardware which you had previously dedicated to the Secondary NameNode.
Enabling High Availability and Automatic Failover with Quorum-based Storage
The Enable High Availability workflow leads you through adding a second (standby) NameNode and configuring JournalNodes. During the workflow, Cloudera Manager creates a federated namepace.
- Go to the HDFS service.
-
Select
- Select the host where you want the standby NameNode to be set up. The standby NameNode cannot be on the same host as the active NameNode, and the host that is chosen should have the same hardware configuration (RAM, disk space, number of cores, and so on) as the active NameNode.
- Select an odd number of hosts (a minimum of three) to act as JournalNodes. JournalNodes should be hosted on hosts with similar hardware specification as the NameNodes. It is recommended that you put a JournalNode each on the same hosts as the active and standby NameNodes, and the third JournalNode on similar hardware, such as the JobTracker.
- Click Continue.
- Specify a name for the nameservice or accept the default name nameservice1 and click Continue.
-
Enter a directory location for the JournalNode edits directory into the fields for each JournalNode host.
- You may enter only one directory for each JournalNode. The paths do not need to be the same on every JournalNode.
- The directories you specify should be empty, and must have the appropriate permissions.
- Advanced use only: Decide whether Cloudera Manager should clear existing data in ZooKeeper, standby NameNode, and JournalNodes. If the directories are not empty (for example, you are re-enabling a previous HA configuration), Cloudera Manager will not automatically delete the contents—you can select to delete the contents by keeping the default checkbox selection. The recommended default is to clear the directories. If you choose not to do so, the data should be in sync across the edits directories of the JournalNodes and should have the same version data as the NameNodes.
- Click Continue.
.
A screen showing the hosts that are eligible to run a standby NameNode and the JournalNodes displays.
- If you want to use Hive, Impala, or Hue in a cluster with HA configured, perform the steps in High Availability for Hue, Hive, and Impala.
- Stop the HDFS service.
-
Configure the service RPC port:
- Go to the HDFS service.
- Select .
- Search for dfs.namenode.servicerpc which should display the NameNode Service RPC Port property. (It is found under the NameNode Default Group role group, Ports and Addresses category).
- Change the port value as needed.
-
On a ZooKeeper server host, run the ZooKeeper client CLI:
- Parcels - /opt/cloudera/parcels/CDH/lib/zookeeper/bin/zkCli.sh
- Packages - /usr/lib/zookeeper/bin/zkCli.sh
-
Execute the following to remove the pre-configured nameservice. This example assumes the name of the
Nameservice is nameservice1. You can identify the nameservice from the
Federation and High Availability section on the HDFS
Instances tab:
rmr /hadoop-ha/nameservice1
- Click the HDFS Instances tab.
- Select .
- Start the HDFS service.
High Availability for Hue, Hive, and Impala
- Configure the HDFS Web Interface Role for Hue to be a HttpFS role.
- Update the Hive Metastore database to use HA. You must do this for each Hive service in your cluster.
- Make Impala aware of the updated settings for the Metastore database.
Configuring Hue to Work with High Availability
- Go to the HDFS service.
- Click the Instances tab.
- Click the Add button.
- Click the textbox under the HttpFS role, select a host where you want to install the HttpFS role, and click Continue.
- After you are returned to the Instances page, select the new HttpFS role.
- Select .
- After the command has completed, go to the Hue service.
- Select .
- Select the property.
- Select HttpFS instead of the NameNode role, and save your changes.
- Restart the Hue service.
Updating the Hive Metastore for High Availability
- Go the Hive service.
-
Select
NoteConfirm that you want to stop the service.
: You may want to stop the Hue and Impala services first, if present, as they depend on the Hive service.
.
- When the service has stopped, back up the Hive Metastore database to persistent storage.
- Select and confirm the command.
- Select . Also, restart the Hue and Impala services if you stopped them prior to updating the Metastore.
Configuring Impala to Work with High Availability
- Complete the steps to reconfigure the Hive Metastore database, as described in the preceding section. Impala shares the same underlying database with Hive, to manage metadata for databases, tables, and so on.
- Issue the INVALIDATE METADATA statement from an Impala shell. This one-time operation makes all Impala daemons across the cluster aware of the latest settings for the Hive Metastore database. Alternatively, restart the Impala service.
Manually Failing Over to the Standby NameNode
If you are running a HDFS service with HA enabled, you can manually cause the active NameNode to failover to the standby NameNode. This is useful for planned downtime—for hardware changes, configuration changes, or software upgrades of your primary host.
- Go to the HDFS service.
- Click the Instances tab.
- Select . (This option does not appear if HA is not enabled for the cluster.)
- From the pop-up, select the NameNode that should be made active,
then click Manual Failover. Note
: For advanced use only: To force the selected NameNode to be active, irrespective of its state or the other NameNode's state, set the Force Failover checkbox. You should choose this option only if automatic failover is not enabled. Forcing a failover will first attempt to failover the selected NameNode to active mode and the other NameNode to standby mode. It will do so even if the selected NameNode is in safe mode. If this fails, it will proceed to transition the selected NameNode to active mode. To avoid having two NameNodes be active, use this only if the other NameNode is either definitely stopped, or can be transitioned to standby mode by the first failover step. - When all the steps have been completed, click Finish.
Cloudera Manager transitions the NameNode you selected to be the active NameNode, and the other NameNode to be the standby NameNode. HDFS should never have two active NameNodes.
Disabling High Availability
- Go to the HDFS service.
- Select .
- Select the hosts for the NameNode and the SecondaryNameNode and click Continue.
- Select the HDFS checkpoint directory and click Continue.
- Confirm that you want to take this action.
- Update the Hive Metastore NameNode.
Cloudera Manager ensures that one NameNode is active, and saves the namespace. Then it stops the standby NameNode, creates a SecondaryNameNode, removes the standby NameNode role, and restarts all the HDFS services.
Fencing Methods
In order to ensure that only one NameNode is active at a time, a fencing method is required for the shared edits directory. During a failover, the fencing method is responsible for ensuring that the previous active NameNode no longer has access to the shared edits directory, so that the new active NameNode can safely proceed writing to it.
By default, Cloudera Manager configures HDFS to use a shell fencing method (shell(./cloudera_manager_agent_fencer.py)) that takes advantage of the Cloudera Manager Agent. However, you can configure HDFS to use the sshfence method, or you can add your own shell fencing scripts, instead of or in addition to the one Cloudera Manager provides.
The fencing parameters are found in the
category under the configuration properties for your HDFS service.For details of the fencing methods supplied with CDH 5, and how fencing is configured, see the Fencing Configuration section in the CDH 5 High Availability Guide.
Converting From an NFS-mounted Shared Edits Directory to Quorum-based Storage
Converting a HA configuration from using an NFS-mounted shared edits directory to Quorum-based storage just involves disabling the current HA configuration, then enabling HA using Quorum-based storage.
- Disable HA (see Disabling High Availability).
- Although the standby NameNode role is removed, its name directories are not deleted. Empty these directories.
- Enable HA with Quorum-based Storage .
<< The HDFS Service | Federated Nameservices >> | |