Chapter 25. Installing Nagios (Deprecated)

This section describes installing and testing Nagios, a system that monitors Hadoop cluster components and issues alerts on warning and critical conditions.

1. Install the Nagios RPMs

On the host you have chosen for the Nagios server, install the RPMs:

For RHEL and CentOS:

yum -y install net-snmp net-snmp-utils php-pecl-json
yum -y install wget httpd php net-snmp-perl perl-Net-SNMP fping nagios nagios- plugins nagios-www

For SLES:

zypper -n --no-gpg-checks install net-snmp
zypper -n --no-gpg-checks install wget apache2 php php-curl perl-SNMP perl- Net-SNMP fping nagios nagios-plugins nagios-www

2. Install the Configuration Files

There are several configuration files that must be set up for Nagios.

3. Extract the Nagios Configuration Files

From the HDP companion files, open the configuration_files folder and copy the files in the nagios folder to a temporary directory. The nagios folder contains two sub-folders, objects and plugins.

4. Create the Nagios Directories

Create the following Nagios directories:
mkdir /var/nagios /var/nagios/rw /var/log/nagios /var/log/nagios/spool/checkresults /var/run/nagios
Change ownership on those directories to the Nagios user:
chown -R nagios:nagios /var/nagios /var/nagios/rw /var/log/nagios /var/log/nagios/spool/checkresults /var/run/nagios

5. Copy the Configuration Files

Copy the contents of the objects folder into place:
cp <tmp-directory>/nagios/objects/*.* /etc/nagios/objects/
Copy the contents of the plugins folder into place:
cp <tmp-directory>/nagios/plugins/*.* /usr/lib64/nagios/plugins/

6. Set the Nagios Admin Password

Choose a Nagios administrator password, for example, “admin”.
Set the password. Use the following command:
htpasswd -c -b /etc/nagios/htpasswd.users nagiosadmin admin

7. Set the Nagios Admin Email Contact Address

Open /etc/nagios/objects/contacts.cfg with a text editor.
Change the nagios@localhost value to the admin email address so it can receive alerts.

8. Register the Hadoop Configuration Files

Open /etc/nagios/nagios.cfg with a text editor.

In the section OBJECT CONFIGURATION FILE(S), add the following:

# Definitions for hadoop servers 
cfg_file=/etc/nagios/objects/hadoop-commands.cfg 
cfg_file=/etc/nagios/objects/hadoop-hosts.cfg 
cfg_file=/etc/nagios/objects/hadoop-hostgroups.cfg 
cfg_file=/etc/nagios/objects/hadoop-services.cfg 
cfg_file=/etc/nagios/objects/hadoop-servicegroups.cfg

Change the command-file directive to /var/nagios/rw/nagios.cmd:
command_file=/var/nagios/rw/nagios.cmd

9. Set Hosts

Open /etc/nagios/objects/hadoop-hosts.cfg with a text editor.

Create a "define host { … }" entry for each host in your cluster using the following format:

define host {
 alias @HOST@ 
 host_name @HOST@ 
 use linux-server 
 address @HOST@ 
 check_interval 0.25
 retry_interval 0.25
 max_check_attempts 4
 notifications_enabled 1
 first_notification_delay 0 # Send notification soon after
 #change in the hard state 
 notification_interval 0 # Send the notification once 
 notification_options d,u,r
}

Replace "@HOST@" with the hostname.

10. Set Host Groups

Open /etc/nagios/objects/hadoop-hostgroups.cfg with a text editor.

Create host groups based on all the hosts and services you have installed in your cluster. Each host group entry should follow this format:

define hostgroup {
 hostgroup_name@NAME@ 
 alias@ALIAS@
 members@MEMBERS@
}

The parameters (such as @NAME@) are defined in the following table.

Table 25.1. Host Group Parameters

Parameter	Description
@NAME@	The host group name
@ALIAS@	The host group alias
@MEMBERS@	A comma-separated list of hosts in the group

The following table lists the core and monitoring host groups:

Table 25.2. Core and Monitoring Host Groups

Service	Component	Name	Alias	Members
All servers in the cluster	n/a	all-servers	All Servers	List all servers in the cluster
HDFS	NameNode	namenode	namenode	The NameNode host
HDFS	SecondaryNameNode	snamenode	snamenode	The Secondary NameNode host
MapReduce	JobTracker	jobtracker	jobtracker	The Job Tracker host
HDFS, MapReduce	Slaves	slaves	slaves	List all hosts running DataNode and TaskTrackers
Nagios	n/a	nagios-server	nagios-server	The Nagios server host
Ganglia	n/a	ganglia-server	ganglia-server	The Ganglia server host

The following table lists the ecosystem project host groups:

Table 25.3. Ecosystem Project Host Groups

Service	Component	Name	Alias	Members
HBase	Master	hbasemaster	hbasemaster	List the master server
HBase	Region	region-servers	region-servers	List all region servers
ZooKeeper	n/a	zookeeper-servers	zookeeper-servers	List all ZooKeeper servers
Oozie	n/a	oozie-server	oozie-server	The Oozie server
Hive	n/a	hiveserver	hiveserver	The Hive metastore server
WebHCat	n/a	webhcat-server	webhcat-server	The WebHCat server
Templeton	n/a	templeton-server	templeton-server	The Templeton server

11. Set Services

Open /etc/nagios/objects/hadoop-services.cfg with a text editor. This file contains service definitions for the following services: Ganglia, HBase (Master and Region), ZooKeeper, Hive, Templeton, and Oozie.
Remove any service definitions for services you have not installed.
Replace the parameters @NAGIOS_BIN@ and @STATUS_DAT@ based on the operating system.
- For RHEL and CentOS:
  @STATUS_DAT@ = /var/nagios/status.dat
  @NAGIOS_BIN@ = /usr/bin/nagios
- For SLES:
  @STATUS_DAT@ = /var/lib/nagios/status.dat
  @NAGIOS_BIN@ = /usr/sbin/nagios
If you have installed Hive or Oozie services, replace the parameter @JAVA_HOME@ with the path to the Java home. For example, /usr/java/default.

12. Set Status

Open /etc/nagios/objects/hadoop-commands.cfg with a text editor.
Replace the @STATUS_DAT@ parameter with the location of the Nagios status file. File location depends on your operating system.
- For RHEL and CentOS:
  /var/nagios/status.dat
- For SLES:
  /var/lib/nagios/status.dat

13. Add Templeton Status and Check TCP Wrapper Commands

Open /etc/nagios/objects/hadoop-commands.cfg with a text editor.

Add the following commands:

define command{
 command_name check_templeton_status
 command_line $USER1$/check_wrapper.sh $USER1$/check_templeton_status.sh $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ $ARG7$
 }

define command{
 command_name check_tcp_wrapper
 command_line $USER1$/check_wrapper.sh $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
 }

14. Validate the Installation

Follow these steps to validate your installation.

Validate the Nagios installation:
nagios -v /etc/nagios/nagios.cfg
Start the Nagios server and httpd:
/etc/init.d/nagios start/etc/init.d/httpd start
Confirm that the Nagios server is running:
/etc/init.d/nagios status
This should return:
nagios (pid #) is running...
To test Nagios Services, run the following command:
/usr/lib64/nagios/plugins/check_hdfs_capacity.php -h namenode_hostname -p 50070 -w 80% -c 90%
This should return:
OK: DFSUsedGB:<some#>, DFSTotalGB:<some#>
To test Nagios Access, browse to the Nagios server.
http://<nagios.server>/nagios
Login using the Nagios admin username (nagiosadmin) and password (see Set the Nagios Admin Password). Click on hosts to check that all hosts in the cluster are listed. Click on services to check that all of the Hadoop services are listed for each host.
Test Nagios alerts.
- Login to one of your cluster DataNodes.
- Stop the TaskTracker service:
  su -l mapred -c "/usr/hdp/current/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/ conf stop tasktracker"
- Validate that you received an alert at the admin email address, and that you have critical state showing on the console.
- Start the TaskTracker service.
  su -l mapred -c "/usr/hdp/current/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/ conf start tasktracker"
- Validate that you received an alert at the admin email address, and that critical state is cleared on the console.

Legal notices