Complete the following actions:
Configure automatic failover.
Set up your cluster for automatic failover. Add the following property to the the
hdfs-site.xml
file for both the NameNode nodes:<property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>
List the host-port pairs running the ZooKeeper service. Add the following property to the the
core-site.xml
file for both the NameNode machines:<property> <name>ha.zookeeper.quorum</name> <value>zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181</value> </property>
Note Suffix the configuration key with the nameservice ID to configure the above settings on a per-nameservice basis. For example, in a cluster with federation enabled, you can explicitly enable automatic failover for only one of the nameservices by setting
dfs.ha.automatic-failover.enabled.$my-nameservice-id
.
Initialize HA state in ZooKeeper.
Execute the following command on NN1:
hdfs zkfc -formatZK -force
This command creates a znode in ZooKeeper. The automatic failover system stores uses this znode for data storage.
Check to see if Zookeeper is running. If not, start Zookeeper by executing the following command on the ZooKeeper host nodes.
su - zookeeper -c "export ZOOCFGDIR=/usr/hdp/current/zookeeper-server/conf ; export ZOOCFG=zoo.cfg; source /usr/hdp/current/zookeeper-server/conf/zookeeper-env.sh ; /usr/hdp/current/zookeeper-server/bin/zkServer.sh start"
Start the JournalNodes, NameNodes, and DataNodes using the instructions provided in "Controlling HDP Services Manually," in the HDP Reference Guide.
Start the Zookeeper Failover Controller (ZKFC) by executing the following command:
su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start zkfc"
The sequence of starting the ZKFC determines which NameNode will become Active. For example, if the ZKFC is started on NN1 first, it causes NN1 to become Active.
Note To convert a non-HA cluster to an HA cluster, Hortonworks recommends that you run the
bootstrapStandby
command to initialize NN2 before you start the ZKFC on any of the NameNode nodes.Verify automatic failover.
Locate the Active NameNode.
Use the NameNode web UI to check the status for each NameNode host node.
Cause a failure on the Active NameNode host machine.
For example, you can use the following command to simulate a JVM failure:
kill -9 $PID_of_Active_NameNode
Or, you could power cycle the node or unplug its network interface to simulate outage.
The Standby NameNode should now automatically become Active within several seconds.
Note The amount of time required to detect a failure and trigger a failover depends on the configuration of the
ha.zookeeper.session-timeout.ms
property. The default value for this property is 5 seconds.If the test fails, your HA settings might be incorrectly configured.
Check the logs for the zkfc daemons and the NameNode daemons to diagnose the issue.