Step 4: Automating Failover with Corosync and Pacemaker

Corosync and Pacemaker are popular high-availability utilities that allow you to configure Cloudera Manager to fail over automatically.

Prerequisites:
  1. Install Pacemaker and Corosync on CMS1, MGMT1, CMS2, and MGMT2, using the correct versions for your Linux distribution:
    RHEL/CentOS:
    $ yum install pacemaker corosync
    Ubuntu:
    $ apt-get install pacemaker corosync
    SUSE:
    $ zypper install pacemaker corosync
  2. Make sure that the crm tool exists on all of the hosts. This procedure uses the crm tool, which works with Pacemaker configuration. If this tool is not installed when you installed Pacemaker (verify this by running which crm), you can download and install the tool for your distribution using the instructions at http://crmsh.github.io/installation.

About Corosync and Pacemaker

  • By default, Corosync and Pacemaker are not autostarted as part of the boot sequence. Cloudera recommends leaving this as is. If the machine crashes and restarts, manually make sure that failover was successful and determine the cause of the restart before manually starting these processes to achieve higher availability.
    • If the /etc/default/corosync file exists, make sure that START is set to yes in that file:
      START=yes
    • Make sure that Corosync is not set to start automatically, by running the following command:
      RHEL/CentOS/SUSE:
      $ chkconfig corosync off
      Ubuntu:
      $ update-rc.d -f corosync remove
  • Note which version of Corosync is installed. The contents of the configuration file for Corosync (corosync.conf) that you edit varies based on the version suitable for your distribution. Sample configurations are supplied in this document and are labeled with the Corosync version.
  • This document does not demonstrate configuring Corosync with authentication (with secauth set to on). The Corosync website demonstrates a mechanism to encrypt traffic using symmetric keys.
  • Firewall configuration:
    Corosync uses UDP transport on ports 5404 and 5405, and these ports must be open for both inbound and outbound traffic on all hosts. If you are using IP tables, run a command similar to the following:
    $ sudo iptables -I INPUT -m state --state NEW -p udp -m multiport --dports 5404,5405 -j ACCEPT
    $ sudo iptables -I OUTPUT -m state --state NEW -p udp -m multiport --sports 5404,5405 -j ACCEPT

Setting up Cloudera Manager Server

Set up a Corosync cluster over unicast, between CMS1 and CMS2, and make sure that the hosts can “cluster” together. Then, set up Pacemaker to register Cloudera Manager Server as a resource that it monitors and to fail over to the secondary when needed.

Setting up Corosync

  1. Edit the /etc/corosync/corosync.conf file on CMS1 and replace the entire contents with the following text (use the correct version for your environment):
    Corosync version 1.x:
    compatibility: whitetank
    totem {
            version: 2
            secauth: off
            interface {
                    member {
                            memberaddr: CMS1
                    }
                    member {
                            memberaddr: CMS2
                    }
                    ringnumber: 0
                    bindnetaddr: CMS1
                    mcastport: 5405
            }
            transport: udpu
    }
    
    logging {
            fileline: off
            to_logfile: yes
            to_syslog: yes
            logfile: /var/log/cluster/corosync.log
            debug: off
            timestamp: on
            logger_subsys {
                    subsys: AMF
                    debug: off
            }
    }
    service {
            # Load the Pacemaker Cluster Resource Manager
            name: pacemaker
            ver:  1
            #
    }
    Corosync version 2.x:
    totem {
    version: 2
    secauth: off
    cluster_name: cmf
    transport: udpu
    }
    
    nodelist {
      node {
            ring0_addr: CMS1
            nodeid: 1
           }
      node {
            ring0_addr: CMS2
            nodeid: 2
           }
    }
    
    quorum {
    provider: corosync_votequorum
    two_node: 1
    }
  2. Edit the /etc/corosync/corosync.conf file on CMS2, and replace the entire contents with the following text (use the correct version for your environment):
    Corosync version 1.x:
    compatibility: whitetank
    totem {
            version: 2
            secauth: off
            interface {
                    member {
                            memberaddr: CMS1
                    }
                    member {
                            memberaddr: CMS2
                    }
                    ringnumber: 0
                    bindnetaddr: CMS2
                    mcastport: 5405
            }
            transport: udpu
    }
    
    logging {
            fileline: off
            to_logfile: yes
            to_syslog: yes
            logfile: /var/log/cluster/corosync.log
            debug: off
            timestamp: on
            logger_subsys {
                    subsys: AMF
                    debug: off
            }
    }
    service {
            # Load the Pacemaker Cluster Resource Manager
            name: pacemaker
            ver:  1
            #
    }
    Corosync version 2.x:
    totem {
    version: 2
    secauth: off
    cluster_name: cmf
    transport: udpu
    }
    
    nodelist {
      node {
            ring0_addr: CMS1
            nodeid: 1
           }
      node {
            ring0_addr: CMS2
            nodeid: 2
           }
    }
    
    quorum {
    provider: corosync_votequorum
    two_node: 1
    }
  3. Restart Corosync on CMS1 and CMS2 so that the new configuration takes effect:
    $ service corosync restart

Setting up Pacemaker

You use Pacemaker to set up Cloudera Manager Server as a cluster resource.

See the Pacemaker configuration reference at http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ for more details about Pacemaker options.

The following steps demonstrate one way, recommended by Cloudera, to configure Pacemaker for simple use:
  1. Disable autostart for Cloudera Manager Server (because you manage its lifecycle through Pacemaker) on both CMS1 and CMS2:
    RHEL/CentOS/SUSE:
    $ chkconfig cloudera-scm-server off
    Ubuntu:
    $ update-rc.d -f cloudera-scm-server remove
  2. Make sure that Pacemaker has been started on both CMS1 and CMS2:
    $ /etc/init.d/pacemaker start
  3. Make sure that crm reports two nodes in the cluster:
    # crm status
    Last updated: Wed Mar  4 18:55:27 2015
    Last change: Wed Mar  4 18:38:40 2015 via crmd on CMS1
    Stack: corosync
    Current DC: CMS1 (1) - partition with quorum
    Version: 1.1.10-42f2063
    2 Nodes configured
    0 Resources configured
    
  4. Change the Pacemaker cluster configuration (on either CMS1 or CMS2):
    $ crm configure property no-quorum-policy=ignore
    $ crm configure property stonith-enabled=false
    $ crm configure rsc_defaults resource-stickiness=100
    
    These commands do the following:
    • Disable quorum checks. (Because there are only two nodes in this cluster, quorum cannot be established.)
    • Disable STONITH explicitly (see Enabling STONITH (Shoot the other node in the head)).
    • Reduce the likelihood of the resource being moved among hosts on restarts.
  5. Add Cloudera Manager Server as an LSB-managed resource (either on CMS1 or CMS2):
    $ crm configure primitive cloudera-scm-server lsb:cloudera-scm-server
  6. Verify that the primitive has been picked up by Pacemaker:
    $ crm_mon
    For example:
    $ crm_mon
    Last updated: Tue Jan 27 15:01:35 2015
    Last change: Mon Jan 27 14:10:11 2015
    Stack: classic openais (with plugin)
    Current DC: CMS1 - partition with quorum
    Version: 1.1.11-97629de
    2 Nodes configured, 2 expected votes
    1 Resources configured
    Online: [ CMS1 CMS2 ]
    cloudera-scm-server
    (lsb:cloudera-scm-server):
    Started CMS1
At this point, Pacemaker manages the status of the cloudera-scm-server service on hosts CMS1 and CMS2, ensuring that only one instance is running at a time.

Testing Failover with Pacemaker

Test Pacemaker failover by running the following command to move the cloudera-scm-server resource to CMS2:
$ crm resource move cloudera-scm-server <CMS2>
Test the resource move by connecting to a shell on CMS2 and verifying that the cloudera-scm-server process is now active on that host. It takes usually a few minutes for the new services to come up on the new host.

Enabling STONITH (Shoot the other node in the head)

The following link provides an explanation of the problem of fencing and ensuring (within reasonable limits) that only one host is running a shared resource at a time: http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Clusters_from_Scratch/index.html#idm140603947390416

As noted in that link, you can use several methods (such as IPMI) to achieve reasonable guarantees on remote host shutdown. Cloudera recommends enabling STONITH, based on the hardware configuration in your environment.

Setting up the Cloudera Manager Service

Setting Up Corosync

  1. Edit the /etc/corosync/corosync.conf file on MGMT1 and replace the entire contents with the contents below; make sure to use the correct section for your version of Corosync:
    Corosync version 1.x:
    compatibility: whitetank
    totem {
            version: 2
            secauth: off
            interface {
                    member {
                            memberaddr: MGMT1
                    }
                    member {
                            memberaddr: MGMT2
                    }
                    ringnumber: 0
                    bindnetaddr: MGMT1
                    mcastport: 5405
            }
            transport: udpu
    }
    
    logging {
            fileline: off
            to_logfile: yes
            to_syslog: yes
            logfile: /var/log/cluster/corosync.log
            debug: off
            timestamp: on
            logger_subsys {
                    subsys: AMF
                    debug: off
            }
    }
    service {
            # Load the Pacemaker Cluster Resource Manager
            name: pacemaker
            ver:  1
            #
    }
    Corosync version 2.x:
    totem {
    version: 2
    secauth: off
    cluster_name: mgmt
    transport: udpu
    }
    
    nodelist {
      node {
            ring0_addr: MGMT1
            nodeid: 1
           }
      node {
            ring0_addr: MGMT2
            nodeid: 2
           }
    }
    
    quorum {
    provider: corosync_votequorum
    two_node: 1
    }
  2. Edit the /etc/corosync/corosync.conf file on MGMT2 andf replace the contents with the contents below:
    Corosync version 1.x:
    compatibility: whitetank
    totem {
            version: 2
            secauth: off
            interface {
                    member {
                            memberaddr: MGMT1
                    }
                    member {
                            memberaddr: MGMT2
                    }
                    ringnumber: 0
                    bindnetaddr: MGMT2
                    mcastport: 5405
            }
            transport: udpu
    }
    
    logging {
            fileline: off
            to_logfile: yes
            to_syslog: yes
            logfile: /var/log/cluster/corosync.log
            debug: off
            timestamp: on
            logger_subsys {
                    subsys: AMF
                    debug: off
            }
    }
    service {
            # Load the Pacemaker Cluster Resource Manager
            name: pacemaker
            ver:  1
            #
    }
    Corosync version 2.x:
    totem {
    version: 2
    secauth: off
    cluster_name: mgmt
    transport: udpu
    }
    
    nodelist {
      node {
            ring0_addr: CMS1
            nodeid: 1
           }
      node {
            ring0_addr: CMS2
            nodeid: 2
           }
    }
    
    quorum {
    provider: corosync_votequorum
    two_node: 1
    }
  3. Restart Corosync on MGMT1 and MGMT2 for the new configuration to take effect:
    $ service corosync restart
  4. Test whether Corosync has set up a cluster, by using the corosync-cmapctl or corosync-objctl commands. You should see two members with status joined:
    corosync-objctl | grep "member"
    runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
    runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(MGMT1)
    runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
    runtime.totem.pg.mrp.srp.members.1.status (str) = joined
    runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
    runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(MGMT2)
    runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
    runtime.totem.pg.mrp.srp.members.2.status (str) = joined
    

Setting Up Pacemaker

Use Pacemaker to set up Cloudera Management Service as a cluster resource.

See the Pacemaker configuration reference at http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ for more information about Pacemaker options.

Because the lifecycle of Cloudera Management Service is managed through the Cloudera Manager Agent, you configure the Cloudera Manager Agent to be highly available.

Follow these steps to configure Pacemaker, recommended by Cloudera for simple use:

  1. Disable autostart for the Cloudera Manager Agent (because Pacemaker manages its lifecycle) on both MGMT1 and MGMT2:
    RHEL/CentOS/SUSE
    $ chkconfig cloudera-scm-agent off
    Ubuntu:
    $ update-rc.d -f cloudera-scm-agent remove
  2. Make sure that Pacemaker is started on both MGMT1 and MGMT2:
    $ /etc/init.d/pacemaker start
  3. Make sure that the crm command reports two nodes in the cluster; you can run this command on either host:
    # crm status
    Last updated: Wed Mar  4 18:55:27 2015
    Last change: Wed Mar  4 18:38:40 2015 via crmd on MGMT1
    Stack: corosync
    Current DC: MGMT1 (1) - partition with quorum
    Version: 1.1.10-42f2063
    2 Nodes configured
    0 Resources configured
  4. Change the Pacemaker cluster configuration on either MGMT1 or MGMT2:
    $ crm configure property no-quorum-policy=ignore
    $ crm configure property stonith-enabled=false
    $ crm configure rsc_defaults resource-stickiness=100

    As with Cloudera Manager Server Pacemaker configuration, this step disables quorum checks, disables STONITH explicitly, and reduces the likelihood of resources being moved between hosts.

  5. Create an Open Cluster Framework (OCF) provider on both MGMT1 and MGMT2 for Cloudera Manager Agent for use with Pacemaker:
    1. Create an OCF directory for creating OCF resources for Cloudera Manager:
      $ mkdir -p /usr/lib/ocf/resource.d/cm
    2. Create a Cloudera Manager Agent OCF wrapper as a file at /usr/lib/ocf/resource.d/cm/agent, with the following content, on both MGMT1 and MGMT2:
      #!/bin/sh
      #######################################################################
      # CM Agent OCF script
      #######################################################################
      #######################################################################
      # Initialization:
      : ${__OCF_ACTION=$1}
      OCF_SUCCESS=0
      OCF_ERROR=1
      OCF_STOPPED=7
      #######################################################################
      
      meta_data() {
              cat <<END
      <?xml version="1.0"?>
      <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
      <resource-agent name="Cloudera Manager Agent" version="1.0">
      <version>1.0</version>
      
      <longdesc lang="en">
       This OCF agent handles simple monitoring, start, stop of the Cloudera
       Manager Agent, intended for use with Pacemaker/corosync for failover.
      </longdesc>
      <shortdesc lang="en">Cloudera Manager Agent OCF script</shortdesc>
      
      <parameters />
      
      <actions>
      <action name="start"        timeout="20" />
      <action name="stop"         timeout="20" />
      <action name="monitor"      timeout="20" interval="10" depth="0"/>
      <action name="meta-data"    timeout="5" />
      </actions>
      </resource-agent>
      END
      }
      
      #######################################################################
      
      agent_usage() {
      cat <<END
       usage: $0 {start|stop|monitor|meta-data}
       Cloudera Manager Agent HA OCF script - used for managing Cloudera Manager Agent and managed processes lifecycle for use with Pacemaker.
      END
      }
      
      agent_start() {
          service cloudera-scm-agent start
          if [ $? =  0 ]; then
              return $OCF_SUCCESS
          fi
          return $OCF_ERROR
      }
      
      agent_stop() {
          service cloudera-scm-agent hard_stop_confirmed
          if [ $? =  0 ]; then
              return $OCF_SUCCESS
          fi
          return $OCF_ERROR
      }
      
      agent_monitor() {
              # Monitor _MUST!_ differentiate correctly between running
              # (SUCCESS), failed (ERROR) or _cleanly_ stopped (NOT RUNNING).
              # That is THREE states, not just yes/no.
              service cloudera-scm-agent status
              if [ $? = 0 ]; then
                  return $OCF_SUCCESS
              fi
              return $OCF_STOPPED
      }
      
      
      case $__OCF_ACTION in
      meta-data)      meta_data
                      exit $OCF_SUCCESS
                      ;;
      start)          agent_start;;
      stop)           agent_stop;;
      monitor)        agent_monitor;;
      usage|help)     agent_usage
                      exit $OCF_SUCCESS
                      ;;
      *)              agent_usage
                      exit $OCF_ERR_UNIMPLEMENTED
                      ;;
      esac
      rc=$?
      exit $rc
    3. Run chmod on that file to make it executable:
      $ chmod 770 /usr/lib/ocf/resource.d/cm/agent
  6. Test the OCF resource script:
    $ /usr/lib/ocf/resource.d/cm/agent monitor

    This script should return the current running status of the SCM agent.

  7. Add Cloudera Manager Agent as an OCF-managed resource (either on MGMT1 or MGMT2):
    $ crm configure primitive cloudera-scm-agent ocf:cm:agent
  8. Verify that the primitive has been picked up by Pacemaker by running the following command:
    $ crm_mon
    For example:
    >crm_mon
    Last updated: Tue Jan 27 15:01:35 2015
    Last change: Mon Jan 27 14:10:11 2015ls /
    Stack: classic openais (with plugin)
    Current DC: CMS1 - partition with quorum
    Version: 1.1.11-97629de
    2 Nodes configured, 2 expected votes
    1 Resources configured
    Online: [ MGMT1 MGMT2 ]
    cloudera-scm-agent
    (ocf:cm:agent):
    Started MGMT2
Pacemaker starts managing the status of the cloudera-scm-agent service on hosts MGMT1 and MGMT2, ensuring that only one instance is running at a time.

Testing Failover with Pacemaker

Test that Pacemaker can move resources by running the following command, which moves the cloudera-scm-agent resource to MGMT2:
$ crm resource move cloudera-scm-agent MGMT2

Test the resource move by connecting to a shell on MGMT2 and verifying that the cloudera-scm-agent and the associated Cloudera Management Services processes are now active on that host. It usually takes a few minutes for the new services to come up on the new host.