Complete the following instructions:
Configure automatic failover.
Set up your cluster for automatic failover. Add the following property to the the hdfs-site.xml file for both the NameNode machines:
<property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>
List the host-port pairs running the ZooKeeper service. Add the following property to the the core-site.xml file for both the NameNode machines:
<property> <name>ha.zookeeper.quorum</name> <value>zk1.example.com:2181,zk2.example.com:2181, zk3.example.com:2181</value> </property>
Note Suffix the configuration key with the nameservice ID to configure the above settings on a per-nameservice basis. For example, in a cluster with federation enabled, you can explicitly enable automatic failover for only one of the nameservices by setting
dfs.ha.automatic-failover.enabled.$my-nameservice-id
.
Initialize HA state in ZooKeeper.
Execute the following command on NN1:
hdfs zkfc -formatZK -force
This command creates a znode in ZooKeeper. The automatic failover system stores uses this znode for data storage.
Check to see if Zookeeper is running. If not, start Zookeeper by executing the following command on the ZooKeeper host machine(s).
su - zookeeper -c "export ZOOCFGDIR=/usr/hdp/current/zookeeper-server/conf ; export ZOOCFG=zoo.cfg; source /usr/hdp/current/zookeeper-server/conf/zookeeper-env.sh ; /usr/hdp/current/zookeeper-server/bin/zkServer.sh start"
Start the JournalNodes, NameNodes, and DataNodes using the instructions provided in "Controlling HDP Services Manually," in the HDP Reference Guide.
Start the Zookeeper Failover Controller (ZKFC) by executing the following command:
su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start zkfc"
The sequence of starting ZKFC determines which NameNode will become Active. For example, if ZKFC is started on NN1 first, it will cause NN1 to become Active.
Note To convert a non-HA cluster to an HA cluster, Hortonworks recommends that you run the
bootstrapStandby
command (this command is used to initialize NN2) before you start ZKFC on any of the NameNode machines.Verify automatic failover.
Locate the Active NameNode.
Use the NameNode web UI to check the status for each NameNode host machine.
Cause a failure on the Active NameNode host machine.
For example, you can use the following command to simulate a JVM crash:
kill -9 $PID_of_Active_NameNode
Or, you could power cycle the machine or unplug its network interface to simulate outage.
The Standby NameNode should now automatically become Active within several seconds.
Note The amount of time required to detect a failure and trigger a failover depends on the configuration of ha.zookeeper.session-timeout.ms property (default value is 5 seconds).
If the test fails, your HA settings might be incorrectly configured.
Check the logs for the zkfc daemons and the NameNode daemons to diagnose the issue.