3.2.7. Gathering General Information

Operating System Information

The following commands will provide the Linux Kernel version and type.  With this information, you can determine if HDP is running on a supported platform.

Command:

uname -a

Example:

$ uname -a
Linux test63.localdomain 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 
12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Command:

cat /proc/version

Example:

$ cat /proc/version
Linux version 2.6.32-279.el6.x86_64 
(mockbuild@c6b9.bsys.dev.centos.org) (gcc version 4.4.6 20120305 (Red 
Hat 4.4.6-4) (GCC) ) #1 SMP Fri Jun 22 12:19:21 UTC 2012

Command:

cat /etc/*-release

Example:

$ cat /etc/*-release
CentOS release 6.3 (Final)
CentOS release 6.3 (Final)
CentOS release 6.3 (Final)

Determine Installed Software

This information is helpful when troubleshooting performance-related issues, or when there is unexpected behavior occurring on one specific machine.  One example would be a MapReduce job that suddenly starts running for 20 minutes rather than the expected 1 minute. The following command does not list any tarball-type installations, so you should keep in mind the possibility that some programs may have been installed outside of the system package manager. 

Command:

rpm -qa

Example:

# rpm -qa | egrep "hadoop|yarn"
hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-mapreduce-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-lzo-native-0.5.0-1.x86_64
hadoop-mapreduce-historyserver-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-lzo-0.5.0-1.x86_64
hadoop-yarn-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-libhdfs-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-yarn-resourcemanager-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-client-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-yarn-nodemanager-2.2.0.2.0.6.0-76.el6.x86_64

Detect Running Processes

This information is helpful when troubleshooting performance-related issues, or when there is unexpected behavior occurring on one specific machine. 

Command:

ps -aux

Example:

$ ps -aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  19348   620 ?        Ss   Sep25   0:06 /sbin/init
postgres  6705  0.0  0.0 214952  2936 ?        Ss   09:18   0:00 postgres: mapred ambarirca 10.10.3.27(60031) idle 
root         3  0.0  0.0      0     0 ?        S    Sep25   0:00 [migration/0]
root         4  0.0  0.0      0     0 ?        S    Sep25   0:07 [ksoftirqd/0]

Detect Java Running Processes

The command below lists the Java processes that are running on the machine.  Since most Hadoop code is based on Java, using this command can also help verify that the Hadoop processes are running on a specific machine.

[Note]Note

The jps command may not be within the PATH variable for the user that is logged in. If not, you must first set the PATH variable, or you can simply provide the full path, for example:

/usr/jdk64/jdk1.6.0_31/bin/jps

Command:

jps

Example output:

10528 Resource Manager
25185 Jps
9202 RunJar
10141 Bootstrap
8001 QuorumPeerMain
7357 NameNode
8358 HMaster
12474 HRegionServer
9605 RunJar
1921 Node Manager
8857 JobHistoryServer
5612 DataNode
17667 RunJar
2943 AmbariServer
11103 SecondaryNameNode

Show Open Files Linked to a Process ID

This information is helpful in determining which process has a lock on a specific file, such as issues where errors state that a file is locked, and hence a process cannot start because it cannot write to a file.

Command:

lsof -p <pid> | grep <file string name>

Example:

$ lsof -p 8857 | grep var
java    8857 mapred    1w   REG              253,0     2031 542470 /var/log/hadoop-mapreduce/mapred/mapred-mapred-historyserver-sandbox.out
java    8857 mapred    2w   REG              253,0     2031 542470 /var/log/hadoop-mapreduce/mapred/mapred-mapred-historyserver-sandbox.out
java    8857 mapred  159w   REG              253,0    95452 542286 /var/log/hadoop-mapreduce/mapred/mapred-mapred-historyserver-sandbox.log

Verifying Well-formed XML

The following command can help determine if a configuration file in XML format is well-formed. If the XML file is well-formed, it will simply be opened. If there are problems with the file, a list of errors will be displayed. This command can help uncover any syntax errors that might occur as a result of manually editing configuration files in Hadoop. The example below shows the list of errors returned when an XML file is not well-formed.

Command:

xmllint <xml file>

Example:

$ xmllint ./hdfs-site.xml
./hdfs-site.xml:187: parser error : Opening and ending tag mismatch: property line 6 and configuration
 </configuration>
                 ^
./hdfs-site.xml:188: parser error : Premature end of data in tag property line 3

^
./hdfs-site.xml:188: parser error : Premature end of data in tag configuration line 2

^

Detect Auto-start Processes

Ideally the cluster administrator should have this information. Information about auto-start processes can help in determining why a certain behavior is specific to a machine.  For instance, it is possible that a process that auto-starts on boot-up is preventing one of the HDP components to launch due to a port conflict after rebooting a node. Below is the command that returns a list of cron jobs.

Command:

for user in $(cut -f1 -d: /etc/passwd); do crontab -u $user -l; done

Example:

$ for user in $(cut -f1 -d: /etc/passwd); do crontab -u $user -l; done
no crontab for root
no crontab for bin
no crontab for daemon
no crontab for adm
no crontab for lp
no crontab for sync
no crontab for shutdown
no crontab for halt
no crontab for mail

Get a List of All Mounts on a Machine

Gathering this information will help in determining what drives are actually mounted or available for use on the node. The following commands are helpful in determining if the system is hitting any storage limitations. Unexpected behavior can occur when disks are full.

Command:

df

Example:

$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root
                     11272464   4729432   5970412  45% /
tmpfs                   961928       272    961656   1% /dev/shm
/dev/sda1               495844     37433    432811   8% /boot

Command:

cat /etc/fstab

Example:

root@a2nn:~> cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Wed Mar 20 15:03:22 2013
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/vg_a2nn-lv_root /                       ext4    defaults        1 1
UUID=8bbdbae7-9cb8-4b66-af1c-4f904f047501 /boot                   ext4    defaults        1 2
/dev/mapper/vg_a2nn-lv_swap swap                    swap    defaults        0 0
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
root@a2nn:~> 

Operating System Log Files

These files can help in determining if a machine was rebooted or shut down at a particular time. The log files can help determine why some HDP services were not working or not operational at a specific time.

  • /var/log/messages

    Contains global system messages, including messages that are logged during system start-up.

  • /var/log/audit/audit.log

    This file is only available if the /etc/init.d/auditd daemon has been started. To check status, execute /etc/init.d/auditd status.  This file can be used to check which user executed a command at a particular time.

Hardware Information

The following commands provide information about the hardware components installed on the machine. This will help in isolating issues related to hardware.

The following command lists information about the PCI buses and devices in the system:

lspci

Example:

$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:02.0 VGA compatible controller: InnoTek Systemberatung GmbH VirtualBox Graphics Adapter
00:03.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] (rev 10)
00:04.0 System peripheral: InnoTek Systemberatung GmbH VirtualBox Guest Service
00:05.0 Multimedia audio controller: Intel Corporation 82801AA AC'97 Audio Controller (rev 01)
00:06.0 USB controller: Apple Computer Inc. KeyLargo/Intrepid USB
00:07.0 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:08.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] (rev 10)
00:0d.0 SATA controller: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) SATA Controller [AHCI mode] (rev 02)

The following command returns CPU information for the machine:

cat /proc/cpuinfo

Example:

$ cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz
stepping	: 9
cpu MHz	: 2283.256
cache size	: 6144 KB

Network Information

This set of commands can be helpful when troubleshooting network issues. 

The following command provides the IP address and validates that the network interfaces are up, and that there is an IP address tied to the interfaces. When dealing with issues involving communication between nodes, this is a good place to start.

ifconfig

Example:

$ ifconfig
eth0      Link encap:Ethernet  HWaddr 08:00:27:76:CD:33  
          inet addr:10.10.3.27  Bcast:10.10.3.255  Mask:255.255.254.0
          inet6 addr: fe80::a00:27ff:fe76:cd33/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1217765 errors:0 dropped:0 overruns:0 frame:0
          TX packets:336245 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:308949876 (294.6 MiB)  TX bytes:128725650 (122.7 MiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1609854 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1609854 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:619945138 (591.2 MiB)  TX bytes:619945138 (591.2 MiB)

virbr0    Link encap:Ethernet  HWaddr 52:54:00:EB:5E:B7  
          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:175 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:10172 (9.9 KiB)

The following command returns a list of the ports used within the system. This is helpful in determining if a specific port is already in use, and therefore another application cannot bind and listen on the port. This command can be useful when trying to isolate why certain HDP Master processes are not able to start up.

netstat -an

Example:

$ netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address             State      
tcp        0      0 10.10.10.157:51111          0.0.0.0:*                   LISTEN      
tcp        0      0 127.0.0.1:199               0.0.0.0:*                   LISTEN      
tcp        0      0 10.10.10.157:50090          0.0.0.0:*                   LISTEN      
tcp        0      0 0.0.0.0:8010                0.0.0.0:*                   LISTEN      
tcp        0      0 0.0.0.0:3306                0.0.0.0:*                   LISTEN      
tcp        0      0 0.0.0.0:8651                0.0.0.0:*                   LISTEN

You can use the following command to start, stop, and check the status of the firewall service on a CentOS or RHEL operating system. The firewall service status represents another possible reason why communication cannot be established between nodes. 

service iptables [stop | status | start]

Example:

$service iptables status
iptables: Firewall is not running.