Installing HDP Manually
Also available as:
PDF
loading table of contents...

Chapter 25. Installing Nagios (Deprecated)

This section describes installing and testing Nagios, a system that monitors Hadoop cluster components and issues alerts on warning and critical conditions.

Install the Nagios RPMs

On the host you have chosen for the Nagios server, install the RPMs:

  • For RHEL and CentOS:

    yum -y install net-snmp net-snmp-utils php-pecl-json
    yum -y install wget httpd php net-snmp-perl perl-Net-SNMP fping nagios nagios- plugins nagios-www
  • For SLES:

    zypper -n --no-gpg-checks install net-snmp
    zypper -n --no-gpg-checks install wget apache2 php php-curl perl-SNMP perl- Net-SNMP fping nagios nagios-plugins nagios-www

Install the Configuration Files

There are several configuration files that must be set up for Nagios.

Extract the Nagios Configuration Files

From the HDP companion files, open the configuration_files folder and copy the files in the nagios folder to a temporary directory. The nagios folder contains two sub-folders, objects and plugins.

Create the Nagios Directories

  1. Create the following Nagios directories:

    mkdir /var/nagios /var/nagios/rw /var/log/nagios /var/log/nagios/spool/checkresults /var/run/nagios

  2. Change ownership on those directories to the Nagios user:

    chown -R nagios:nagios /var/nagios /var/nagios/rw /var/log/nagios /var/log/nagios/spool/checkresults /var/run/nagios

Copy the Configuration Files

  1. Copy the contents of the objects folder into place:

    cp <tmp-directory>/nagios/objects/*.* /etc/nagios/objects/

  2. Copy the contents of the plugins folder into place:

    cp <tmp-directory>/nagios/plugins/*.* /usr/lib64/nagios/plugins/

Set the Nagios Admin Password

  1. Choose a Nagios administrator password, for example, “admin”.

  2. Set the password. Use the following command:

    htpasswd -c -b /etc/nagios/htpasswd.users nagiosadmin admin

Set the Nagios Admin Email Contact Address

  1. Open /etc/nagios/objects/contacts.cfg with a text editor.

  2. Change the nagios@localhost value to the admin email address so it can receive alerts.

Register the Hadoop Configuration Files

  1. Open /etc/nagios/nagios.cfg with a text editor.

  2. In the section OBJECT CONFIGURATION FILE(S), add the following:

    # Definitions for hadoop servers 
    cfg_file=/etc/nagios/objects/hadoop-commands.cfg 
    cfg_file=/etc/nagios/objects/hadoop-hosts.cfg 
    cfg_file=/etc/nagios/objects/hadoop-hostgroups.cfg 
    cfg_file=/etc/nagios/objects/hadoop-services.cfg 
    cfg_file=/etc/nagios/objects/hadoop-servicegroups.cfg
  3. Change the command-file directive to /var/nagios/rw/nagios.cmd:

    command_file=/var/nagios/rw/nagios.cmd

Set Hosts

  1. Open /etc/nagios/objects/hadoop-hosts.cfg with a text editor.

  2. Create a "define host { … }" entry for each host in your cluster using the following format:

    define host {
     alias @HOST@ 
     host_name @HOST@ 
     use linux-server 
     address @HOST@ 
     check_interval 0.25
     retry_interval 0.25
     max_check_attempts 4
     notifications_enabled 1
     first_notification_delay 0 # Send notification soon after
     #change in the hard state 
     notification_interval 0 # Send the notification once 
     notification_options d,u,r
    }
  3. Replace "@HOST@" with the hostname.

Set Host Groups

  1. Open /etc/nagios/objects/hadoop-hostgroups.cfg with a text editor.

  2. Create host groups based on all the hosts and services you have installed in your cluster. Each host group entry should follow this format:

    define hostgroup {
     hostgroup_name@NAME@ 
     alias@ALIAS@
     members@MEMBERS@
    }

    The parameters (such as @NAME@) are defined in the following table.

    Table 25.1. Host Group Parameters

    Parameter

    Description

    @NAME@

    The host group name

    @ALIAS@

    The host group alias

    @MEMBERS@

    A comma-separated list of hosts in the group


    The following table lists the core and monitoring host groups:

    Table 25.2. Core and Monitoring Host Groups

    Service

    Component

    Name

    Alias

    Members

    All servers in the cluster

    n/a

    all-servers

    All Servers

    List all servers in the cluster

    HDFS

    NameNode

    namenode

    namenode

    The NameNode host

    HDFS

    SecondaryNameNode

    snamenode

    snamenode

    The Secondary NameNode host

    MapReduce

    JobTracker

    jobtracker

    jobtracker

    The Job Tracker host

    HDFS, MapReduce

    Slaves

    slaves

    slaves

    List all hosts running DataNode and TaskTrackers

    Nagios

    n/a

    nagios-server

    nagios-server

    The Nagios server host

    Ganglia

    n/a

    ganglia-server

    ganglia-server

    The Ganglia server host


    The following table lists the ecosystem project host groups:

    Table 25.3. Ecosystem Project Host Groups

    Service

    Component

    Name

    Alias

    Members

    HBase

    Master

    hbasemaster

    hbasemaster

    List the master server

    HBase

    Region

    region-servers

    region-servers

    List all region servers

    ZooKeeper

    n/a

    zookeeper-servers

    zookeeper-servers

    List all ZooKeeper servers

    Oozie

    n/a

    oozie-server

    oozie-server

    The Oozie server

    Hive

    n/a

    hiveserver

    hiveserver

    The Hive metastore server

    WebHCat

    n/a

    webhcat-server

    webhcat-server

    The WebHCat server

    Templeton

    n/a

    templeton-server

    templeton-server

    The Templeton server


Set Services

  1. Open /etc/nagios/objects/hadoop-services.cfg with a text editor. This file contains service definitions for the following services: Ganglia, HBase (Master and Region), ZooKeeper, Hive, Templeton, and Oozie.

  2. Remove any service definitions for services you have not installed.

  3. Replace the parameters @NAGIOS_BIN@ and @STATUS_DAT@ based on the operating system.

    • For RHEL and CentOS:

      @STATUS_DAT@ = /var/nagios/status.dat

      @NAGIOS_BIN@ = /usr/bin/nagios

    • For SLES:

      @STATUS_DAT@ = /var/lib/nagios/status.dat

      @NAGIOS_BIN@ = /usr/sbin/nagios

  4. If you have installed Hive or Oozie services, replace the parameter @JAVA_HOME@ with the path to the Java home. For example, /usr/java/default.

Set Status

  1. Open /etc/nagios/objects/hadoop-commands.cfg with a text editor.

  2. Replace the @STATUS_DAT@ parameter with the location of the Nagios status file. File location depends on your operating system.

    • For RHEL and CentOS:

      /var/nagios/status.dat

    • For SLES:

      /var/lib/nagios/status.dat

Add Templeton Status and Check TCP Wrapper Commands

  1. Open /etc/nagios/objects/hadoop-commands.cfg with a text editor.

  2. Add the following commands:

    define command{
     command_name check_templeton_status
     command_line $USER1$/check_wrapper.sh $USER1$/check_templeton_status.sh $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ $ARG7$
     }
    
    define command{
     command_name check_tcp_wrapper
     command_line $USER1$/check_wrapper.sh $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
     }

Validate the Installation

Follow these steps to validate your installation.

  1. Validate the Nagios installation:

    nagios -v /etc/nagios/nagios.cfg

  2. Start the Nagios server and httpd:

    /etc/init.d/nagios start/etc/init.d/httpd start

  3. Confirm that the Nagios server is running:

    /etc/init.d/nagios status

    This should return:

    nagios (pid #) is running...

  4. To test Nagios Services, run the following command:

    /usr/lib64/nagios/plugins/check_hdfs_capacity.php -h namenode_hostname -p 50070 -w 80% -c 90%

    This should return:

    OK: DFSUsedGB:<some#>, DFSTotalGB:<some#>

  5. To test Nagios Access, browse to the Nagios server.

    http://<nagios.server>/nagios

    Login using the Nagios admin username (nagiosadmin) and password (see Set the Nagios Admin Password). Click on hosts to check that all hosts in the cluster are listed. Click on services to check that all of the Hadoop services are listed for each host.

  6. Test Nagios alerts.

    • Login to one of your cluster DataNodes.

    • Stop the TaskTracker service:

      su -l mapred -c "/usr/hdp/current/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/ conf stop tasktracker"

    • Validate that you received an alert at the admin email address, and that you have critical state showing on the console.

    • Start the TaskTracker service.

      su -l mapred -c "/usr/hdp/current/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/ conf start tasktracker"

    • Validate that you received an alert at the admin email address, and that critical state is cleared on the console.