Complete the following prerequisites before configuring HA for your cluster:
Ensure that you complete the following hardware prerequisites:
Shared Storage.
Shared storage is required for storing the NameNode metadata. Use a highly available shared storage NFS device.
Ensure that you use a Power fencing device.
Note Red Hat HA cluster utilizes power fencing to deal with network split-brain events. Fencing guarantees the integrity of NameNode metadata. For more information, see: Fencing Topology.
IP fail over with a floating IP
Ensure that additional static IPs are available for the HDP Master Services or the cluster.
The IP must be a static reserved entry in your network DNS table. This IP will act as the public IP for the HDP Master Service.
Note Red Hat HA clustering utilizes a floating IP address for the NameNode and/or the JobTracker service across the HA cluster. More details on using a floating IP for RHEL are available here.
Hardware requirement for RHEL HA cluster
The RHEL HA cluster must have a minimum of two nodes.
The number of nodes in your HA cluster depends on the number of concurrent node failures you want the HDP platform to withstand. The RHEL HA cluster can be configured to include a maximum of 16 nodes. Choose hardware specs for the RHEL HA Cluster nodes according to the NameNode hardware recommendations available here.
Ensure that you complete the following software prerequisites:
Use the following instructions:
Complete the prerequisites for High Availability Add-On package for RHEL. Use the instructions available here (RHEL v5.x, RHEL v6.x).
Install the HA Add-On package for RHEL. Use the instructions available here (RHEL v5.x,RHEL v6.x ).
Important You can use the graphical user interface (GUI) to configure a RHEL v6.x cluster configuration until you specify a Hadoop service configuration (Deploy HDP HA Configurations). You must use the
cluster.conf
file to specify the Hadoop service configuration. Once the Hadoop service configuration is put in place, any changes made via the GUI will break the configuration. You can still use the GUI to manage the HA NameNode service -- start, stop, and move the service across cluster machines.Ensure that the following cluster configurations are available on all the machines in your RHEL HA cluster:
Cluster domain that specifies all the nodes in the RHEL HA cluster. See instructions here (RHEL v5.x, RHEL v6.x).
Fail over domain. See instructions here (RHEL v5.x, RHEL v6.x).
Power Fencing device. See instructions here (RHEL v5.x, RHEL v6.x).
Add cluster service and resources (Floating IP and NFS mount). Ensure that you add the
<service domain>
configurations and to add resources to the service group: See instructions here (RHEL v5.x, RHEL v6.x).When the above are configured, you will have a
cluster.conf
file similar to the following sample configuration. (Note that this sample configuration does not declare a true fencing device because that is specific to the environment. Modify the configuration values to match your infrastructure environment.)<?xml version="1.0"?> <cluster config_version="8" name="rhel6ha"> <clusternodes> <clusternode name="rhel6ha01" nodeid="1"> <fence> <method name="1"> <device name="BinTrue"/> </method> </fence> </clusternode> <clusternode name="rhel6ha02" nodeid="2"> <fence> <method name="1"> <device name="BinTrue"/> </method> </fence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices> <fencedevice agent="fence_bin_true" name="BinTrue"/> </fencedevices> <rm log_level="7"> <failoverdomains> <failoverdomain name="HDPMaster" ordered="1" restricted="1"> <failoverdomainnode name="rhel6ha01" priority="1"/> <failoverdomainnode name="rhel6ha02" priority="2"/> </failoverdomain> </failoverdomains> <service domain="HDPMaster" name="NameNodeService" recovery="relocate"> <ip address="10.10.10.89" sleeptime="10"/> <netfs export="/hdp/nfs" force_unmount="1" fstype="nfs" host="10.10.10.88" mountpoint="/hdp/hadoop/hdfs/nn" name="HDFS data" options="rw,soft,nolock"/> </service> </rm> </cluster>
Use the following instructions to validate the configurations for RHEL HA cluster:
Validate that the floating IP address is available on the primary machine. (Primary machine is the machine where the NameNode process is currently running).
ip addr show eth1
If the IP address is available, you should see a message (as shown in the following example). In this example, the IP address is configured at
rheln1.hortonworks.local
:root@rheln1 ~]# ip addr show eth1 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:cb:ca:76 brd ff:ff:ff:ff:ff:ff inet 172.16.204.10/24 brd 172.16.204.255 scope global eth1 inet 172.16.204.12/24 scope global secondary eth1 inet6 fe80::20c:29ff:fecb:ca76/64 scope link valid_lft forever preferred_lft forever
Validate that the failover service (in the above example, the
NameNodeService
starts on the secondary machine).ip addr show eth3
Validate fail over for the IP address.
Shut down alternate host machines.
Ensure that the IP address fails over properly.