Complete the following instructions:
Configure automatic failover.
Set up your cluster for automatic failover.
Add the following property to the the
hdfs-site.xml
file for both the NameNode machines:<property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>
List the host-port pairs running the ZooKeeper service.
Add the following property to the the
core-site.xml
file for both the NameNode machines:<property> <name>ha.zookeeper.quorum</name> <value>zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181</value> </property>
Note Suffix the configuration key with the nameservice ID to configure the above settings on a per-nameservice basis. For example, in a cluster with federation enabled, you can explicitly enable automatic failover for only one of the nameservices by setting
dfs.ha.automatic-failover.enabled.
.$my-nameservice-id
Initialize HA state in ZooKeeper.
Execute the following command on the NameNode hosts:
hdfs zkfc -formatZK
This command creates a
znode
in ZooKeeper. The automatic failover system stores uses this znode for data storage.Check to see if Zookeeper is running. If not, start Zookeeper by executing the following command on the ZooKeeper host machine(s).
su - zookeeper -c "export ZOOCFGDIR=/etc/zookeeper/conf ; export ZOOCFG=zoo.cfg ; source /etc/zookeeper/conf/zookeeper-env.sh ; /usr/lib/zookeeper/bin/zkServer.sh start"
Start the JournalNodes, NameNodes, and DataNodes using the instructions provided here.
Start ZKFC.
Manually start the
zkfc
daemon on each of the NameNode host machines using the following command:/usr/lib/hadoop/sbin/hadoop-daemon.sh start zkfc
The sequence of starting ZKFC determines which NameNode will become Active. For example, if ZKFC is started on NN1 first, it will cause NN1 to become Active.
Note To convert a non-HA cluster to an HA cluster, Hortonworks recommends that you run the
bootstrapStandby
command (this command is used to initialize NN2) before you start ZKFC on any of the NameNode machines.Verify automatic failover.
Locate the Active NameNode.
Use the NameNode web UI to check the status for each NameNode host machine.
Cause a failure on the Active NameNode host machine.
For example, you can use the following command to simulate a JVM crash:
kill -9 $PID_of_Active_NameNode
Or, you could power cycle the machine or unplug its network interface to simulate outage.
The Standby NameNode should now automatically become Active within several seconds.
Note The amount of time required to detect a failure and trigger a failover depends on the configuration of
ha.zookeeper.session-timeout.ms
property (default value is 5 seconds).If the test fails, your HA settings might be incorrectly configured.
Check the logs for the zkfc daemons and the NameNode daemons to diagnose the issue.