Step 4: Automating Failover with Corosync and Pacemaker
Corosync and Pacemaker are popular high-availability utilities that allow you to configure Cloudera Manager to fail over automatically.
This document describes one way to set up clustering using these tools. Actual setup can be done in several ways, depending on the network configuration of your environment.
- Install Pacemaker and Corosync on CMS1, MGMT1,
CMS2, and MGMT2, using the correct versions for your Linux
distribution:
RHEL/CentOS:
yum install pacemaker corosync
Ubuntu:apt-get install pacemaker corosync
SUSE:zypper install pacemaker corosync
- Make sure that the crm tool exists on all of the hosts. This procedure uses the crm tool, which works with Pacemaker configuration. If this tool is not installed when you installed Pacemaker (verify this by running which crm), you can download and install the tool for your distribution using the instructions at http://crmsh.github.io/installation.
About Corosync and Pacemaker
- By default, Corosync and Pacemaker are not autostarted as part of the boot sequence. Cloudera recommends leaving this as is. If the machine crashes and restarts, manually make sure
that failover was successful and determine the cause of the restart before manually starting these processes to achieve higher availability.
- If the /etc/default/corosync file exists, make sure that START is set to yes in that file:
START=yes
- Make sure that Corosync is not set to start automatically, by running the following command:
RHEL/CentOS/SUSE:
chkconfig corosync off
Ubuntu:update-rc.d -f corosync remove
- If the /etc/default/corosync file exists, make sure that START is set to yes in that file:
- Note which version of Corosync is installed. The contents of the configuration file for Corosync (corosync.conf) that you edit varies based on the version suitable for your distribution. Sample configurations are supplied in this document and are labeled with the Corosync version.
- This document does not demonstrate configuring Corosync with authentication (with secauth set to on). The Corosync website demonstrates a mechanism to encrypt traffic using symmetric keys.
- Firewall configuration:
Corosync uses UDP transport on ports 5404 and 5405, and these ports must be open for both inbound and outbound traffic on all hosts. If you are using IP tables, run a command similar to the following:
sudo iptables -I INPUT -m state --state NEW -p udp -m multiport --dports 5404,5405 -j ACCEPT sudo iptables -I OUTPUT -m state --state NEW -p udp -m multiport --sports 5404,5405 -j ACCEPT
Setting up Cloudera Manager Server
Set up a Corosync cluster over unicast, between CMS1 and CMS2, and make sure that the hosts can “cluster” together. Then, set up Pacemaker to register Cloudera Manager Server as a resource that it monitors and to fail over to the secondary when needed.
Setting up Corosync
- Edit the /etc/corosync/corosync.conf file on CMS1 and replace the entire contents with
the following text (use the correct version for your environment):
Corosync version 1.x:
compatibility: whitetank totem { version: 2 secauth: off interface { member { memberaddr: CMS1 } member { memberaddr: CMS2 } ringnumber: 0 bindnetaddr: CMS1 mcastport: 5405 } transport: udpu } logging { fileline: off to_logfile: yes to_syslog: yes logfile: /var/log/cluster/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } service { # Load the Pacemaker Cluster Resource Manager name: pacemaker ver: 1 # }
Corosync version 2.x:totem { version: 2 secauth: off cluster_name: cmf transport: udpu } nodelist { node { ring0_addr: CMS1 nodeid: 1 } node { ring0_addr: CMS2 nodeid: 2 } } quorum { provider: corosync_votequorum two_node: 1 }
- Edit the /etc/corosync/corosync.conf file on CMS2, and replace the entire contents with
the following text (use the correct version for your environment):
Corosync version 1.x:
compatibility: whitetank totem { version: 2 secauth: off interface { member { memberaddr: CMS1 } member { memberaddr: CMS2 } ringnumber: 0 bindnetaddr: CMS2 mcastport: 5405 } transport: udpu } logging { fileline: off to_logfile: yes to_syslog: yes logfile: /var/log/cluster/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } service { # Load the Pacemaker Cluster Resource Manager name: pacemaker ver: 1 # }
Corosync version 2.x:totem { version: 2 secauth: off cluster_name: cmf transport: udpu } nodelist { node { ring0_addr: CMS1 nodeid: 1 } node { ring0_addr: CMS2 nodeid: 2 } } quorum { provider: corosync_votequorum two_node: 1 }
- Restart Corosync on CMS1 and CMS2 so that the new
configuration takes effect:
service corosync restart
Setting up Pacemaker
You use Pacemaker to set up Cloudera Manager Server as a cluster resource.
See the Pacemaker configuration reference at http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ for more details about Pacemaker options.
- Disable autostart for Cloudera Manager Server (because you manage its lifecycle through Pacemaker) on both CMS1 and
CMS2:
RHEL/CentOS/SUSE:Ubuntu:
chkconfig cloudera-scm-server off
update-rc.d -f cloudera-scm-server remove
- Make sure that Pacemaker has been started on both CMS1 and CMS2:
/etc/init.d/pacemaker start
- Make sure that crm reports two nodes in the cluster:
# crm status Last updated: Wed Mar 4 18:55:27 2015 Last change: Wed Mar 4 18:38:40 2015 via crmd on CMS1 Stack: corosync Current DC: CMS1 (1) - partition with quorum Version: 1.1.10-42f2063 2 Nodes configured 0 Resources configured
- Change the Pacemaker cluster configuration (on either CMS1 or CMS2):
crm configure property no-quorum-policy=ignore crm configure property stonith-enabled=false crm configure rsc_defaults resource-stickiness=100
These commands do the following:- Disable quorum checks. (Because there are only two nodes in this cluster, quorum cannot be established.)
- Disable STONITH explicitly (see Enabling STONITH (Shoot the other node in the head)).
- Reduce the likelihood of the resource being moved among hosts on restarts.
- Add Cloudera Manager Server as an LSB-managed resource (either on CMS1 or CMS2):
crm configure primitive cloudera-scm-server lsb:cloudera-scm-server
- Verify that the primitive has been picked up by Pacemaker:
crm_mon
For example:$ crm_mon Last updated: Tue Jan 27 15:01:35 2015 Last change: Mon Jan 27 14:10:11 2015 Stack: classic openais (with plugin) Current DC: CMS1 - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 1 Resources configured Online: [ CMS1 CMS2 ] cloudera-scm-server (lsb:cloudera-scm-server): Started CMS1
Testing Failover with Pacemaker
crm resource move cloudera-scm-server <CMS2>Test the resource move by connecting to a shell on CMS2 and verifying that the cloudera-scm-server process is now active on that host. It takes usually a few minutes for the new services to come up on the new host.
Enabling STONITH (Shoot the other node in the head)
The following link provides an explanation of the problem of fencing and ensuring (within reasonable limits) that only one host is running a shared resource at a time: http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Clusters_from_Scratch/index.html#idm140603947390416
As noted in that link, you can use several methods (such as IPMI) to achieve reasonable guarantees on remote host shutdown. Cloudera recommends enabling STONITH, based on the hardware configuration in your environment.
Setting up the Cloudera Manager Service
Setting Up Corosync
- Edit the /etc/corosync/corosync.conf file on MGMT1 and replace the entire contents with
the contents below; make sure to use the correct section for your version of Corosync:
Corosync version 1.x:
compatibility: whitetank totem { version: 2 secauth: off interface { member { memberaddr: MGMT1 } member { memberaddr: MGMT2 } ringnumber: 0 bindnetaddr: MGMT1 mcastport: 5405 } transport: udpu } logging { fileline: off to_logfile: yes to_syslog: yes logfile: /var/log/cluster/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } service { # Load the Pacemaker Cluster Resource Manager name: pacemaker ver: 1 # }
Corosync version 2.x:totem { version: 2 secauth: off cluster_name: mgmt transport: udpu } nodelist { node { ring0_addr: MGMT1 nodeid: 1 } node { ring0_addr: MGMT2 nodeid: 2 } } quorum { provider: corosync_votequorum two_node: 1 }
- Edit the /etc/corosync/corosync.conf file on MGMT2 andf replace the contents with the
contents below:
Corosync version 1.x:
compatibility: whitetank totem { version: 2 secauth: off interface { member { memberaddr: MGMT1 } member { memberaddr: MGMT2 } ringnumber: 0 bindnetaddr: MGMT2 mcastport: 5405 } transport: udpu } logging { fileline: off to_logfile: yes to_syslog: yes logfile: /var/log/cluster/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } service { # Load the Pacemaker Cluster Resource Manager name: pacemaker ver: 1 # }
Corosync version 2.x:totem { version: 2 secauth: off cluster_name: mgmt transport: udpu } nodelist { node { ring0_addr: CMS1 nodeid: 1 } node { ring0_addr: CMS2 nodeid: 2 } } quorum { provider: corosync_votequorum two_node: 1 }
- Restart Corosync on MGMT1 and MGMT2 for the new
configuration to take effect:
service corosync restart
- Test whether Corosync has set up a cluster, by using the corosync-cmapctl or corosync-objctl commands. You should see two
members with status joined:
corosync-objctl | grep "member" runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(MGMT1) runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.1.status (str) = joined runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(MGMT2) runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.2.status (str) = joined
Setting Up Pacemaker
Use Pacemaker to set up Cloudera Management Service as a cluster resource.
See the Pacemaker configuration reference at http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ for more information about Pacemaker options.
Because the lifecycle of Cloudera Management Service is managed through the Cloudera Manager Agent, you configure the Cloudera Manager Agent to be highly available.
Follow these steps to configure Pacemaker, recommended by Cloudera for simple use:
- Disable autostart for the Cloudera Manager Agent (because Pacemaker manages its lifecycle) on both MGMT1 and
MGMT2:
RHEL/CentOS/SUSE
chkconfig cloudera-scm-agent off
Ubuntu:update-rc.d -f cloudera-scm-agent remove
- Make sure that Pacemaker is started on both MGMT1 and MGMT2:
/etc/init.d/pacemaker start
- Make sure that the crm command reports two nodes in the cluster; you can run this command on either host:
# crm status Last updated: Wed Mar 4 18:55:27 2015 Last change: Wed Mar 4 18:38:40 2015 via crmd on MGMT1 Stack: corosync Current DC: MGMT1 (1) - partition with quorum Version: 1.1.10-42f2063 2 Nodes configured 0 Resources configured
- Change the Pacemaker cluster configuration on either MGMT1 or MGMT2:
crm configure property no-quorum-policy=ignore crm configure property stonith-enabled=false crm configure rsc_defaults resource-stickiness=100
As with Cloudera Manager Server Pacemaker configuration, this step disables quorum checks, disables STONITH explicitly, and reduces the likelihood of resources being moved between hosts.
- Create an Open Cluster Framework (OCF) provider on both MGMT1 and MGMT2 for Cloudera Manager Agent for use with Pacemaker:
- Create an OCF directory for creating OCF resources for Cloudera Manager:
mkdir -p /usr/lib/ocf/resource.d/cm
- Create a Cloudera Manager Agent OCF wrapper as a file at /usr/lib/ocf/resource.d/cm/agent, with the following content, on both MGMT1 and MGMT2:
- RHEL-compatible 7 and higher:
#!/bin/sh ####################################################################### # CM Agent OCF script ####################################################################### ####################################################################### # Initialization: : ${__OCF_ACTION=$1} OCF_SUCCESS=0 OCF_ERROR=1 OCF_STOPPED=7 ####################################################################### meta_data() { cat <<END <?xml version="1.0"?> <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd"> <resource-agent name="Cloudera Manager Agent" version="1.0"> <version>1.0</version> <longdesc lang="en"> This OCF agent handles simple monitoring, start, stop of the Cloudera Manager Agent, intended for use with Pacemaker/corosync for failover. </longdesc> <shortdesc lang="en">Cloudera Manager Agent OCF script</shortdesc> <parameters /> <actions> <action name="start" timeout="20" /> <action name="stop" timeout="20" /> <action name="monitor" timeout="20" interval="10" depth="0"/> <action name="meta-data" timeout="5" /> </actions> </resource-agent> END } ####################################################################### agent_usage() { cat <<END usage: $0 {start|stop|monitor|meta-data} Cloudera Manager Agent HA OCF script - used for managing Cloudera Manager Agent and managed processes lifecycle for use with Pacemaker. END } agent_start() { service cloudera-scm-agent start if [ $? = 0 ]; then return $OCF_SUCCESS fi return $OCF_ERROR } agent_stop() { service cloudera-scm-agent next_stop_hard service cloudera-scm-agent stop if [ $? = 0 ]; then return $OCF_SUCCESS fi return $OCF_ERROR } agent_monitor() { # Monitor _MUST!_ differentiate correctly between running # (SUCCESS), failed (ERROR) or _cleanly_ stopped (NOT RUNNING). # That is THREE states, not just yes/no. service cloudera-scm-agent status if [ $? = 0 ]; then return $OCF_SUCCESS fi return $OCF_STOPPED } case $__OCF_ACTION in meta-data) meta_data exit $OCF_SUCCESS ;; start) agent_start;; stop) agent_stop;; monitor) agent_monitor;; usage|help) agent_usage exit $OCF_SUCCESS ;; *) agent_usage exit $OCF_ERR_UNIMPLEMENTED ;; esac rc=$? exit $rc
- All other Linux distributions:
#!/bin/sh ####################################################################### # CM Agent OCF script ####################################################################### ####################################################################### # Initialization: : ${__OCF_ACTION=$1} OCF_SUCCESS=0 OCF_ERROR=1 OCF_STOPPED=7 ####################################################################### meta_data() { cat <<END <?xml version="1.0"?> <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd"> <resource-agent name="Cloudera Manager Agent" version="1.0"> <version>1.0</version> <longdesc lang="en"> This OCF agent handles simple monitoring, start, stop of the Cloudera Manager Agent, intended for use with Pacemaker/corosync for failover. </longdesc> <shortdesc lang="en">Cloudera Manager Agent OCF script</shortdesc> <parameters /> <actions> <action name="start" timeout="20" /> <action name="stop" timeout="20" /> <action name="monitor" timeout="20" interval="10" depth="0"/> <action name="meta-data" timeout="5" /> </actions> </resource-agent> END } ####################################################################### agent_usage() { cat <<END usage: $0 {start|stop|monitor|meta-data} Cloudera Manager Agent HA OCF script - used for managing Cloudera Manager Agent and managed processes lifecycle for use with Pacemaker. END } agent_start() { service cloudera-scm-agent start if [ $? = 0 ]; then return $OCF_SUCCESS fi return $OCF_ERROR } agent_stop() { service cloudera-scm-agent hard_stop_confirmed if [ $? = 0 ]; then return $OCF_SUCCESS fi return $OCF_ERROR } agent_monitor() { # Monitor _MUST!_ differentiate correctly between running # (SUCCESS), failed (ERROR) or _cleanly_ stopped (NOT RUNNING). # That is THREE states, not just yes/no. service cloudera-scm-agent status if [ $? = 0 ]; then return $OCF_SUCCESS fi return $OCF_STOPPED } case $__OCF_ACTION in meta-data) meta_data exit $OCF_SUCCESS ;; start) agent_start;; stop) agent_stop;; monitor) agent_monitor;; usage|help) agent_usage exit $OCF_SUCCESS ;; *) agent_usage exit $OCF_ERR_UNIMPLEMENTED ;; esac rc=$? exit $rc
- RHEL-compatible 7 and higher:
- Run chmod on that file to make it executable:
chmod 770 /usr/lib/ocf/resource.d/cm/agent
- Create an OCF directory for creating OCF resources for Cloudera Manager:
- Test the OCF resource script:
/usr/lib/ocf/resource.d/cm/agent monitor
This script should return the current running status of the SCM agent.
- Add Cloudera Manager Agent as an OCF-managed resource (either on MGMT1 or MGMT2):
crm configure primitive cloudera-scm-agent ocf:cm:agent
- Verify that the primitive has been picked up by Pacemaker by running the following command:
crm_mon
For example:>crm_mon Last updated: Tue Jan 27 15:01:35 2015 Last change: Mon Jan 27 14:10:11 2015ls / Stack: classic openais (with plugin) Current DC: CMS1 - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 1 Resources configured Online: [ MGMT1 MGMT2 ] cloudera-scm-agent (ocf:cm:agent): Started MGMT2
Testing Failover with Pacemaker
crm resource move cloudera-scm-agent MGMT2
Test the resource move by connecting to a shell on MGMT2 and verifying that the cloudera-scm-agent and the associated Cloudera Management Services processes are now active on that host. It usually takes a few minutes for the new services to come up on the new host.