Use the following commands to collect specific information from a Windows based cluster. This data helps to isolate specific deployment issue.
Collect OS information: This data helps to determine if HDP is deployed on a supported operating system (OS).
Execute the following commands on Powershell as an Administrator user:
(Get-WmiObject -class Win32_OperatingSystem).Caption
This command should provide you information about the OS for your host machine. For example,
Microsoft Windows Server 2012 Standard
Execute the following command to determine OS Version for your host machine:
[System.Environment]::OSVersion.Version
Determine installed software: This data can be used to troubleshoot either performance issues or unexpected behavior for a specific node in your cluster. For example, unexpected behavior can be the situation where a MapReduce job runs for longer duration than expected.
To see the list of installed software on a particular host machine, go to Control Panel -> All Control Panel Items -> Programs and Features.
Detect running processes: This data can be used to troubleshoot either performance issues or unexpected behavior for a specific node in your cluster.
You can either press
+ + on the affected host machine or you can execute the following command on Powershell as an Administrator user:tasklist
Detect Java running processes: Use this command to verify the Hadoop processes running on a specific machine.
As
$HADOOP_USER
, execute the following command on the affected host machine:su $HADOOP_USER jps
You should see the following output:
988 Jps 2816 -- process information unavailable 2648 -- process information unavailable 1768 -- process information unavailable
Note that no actual name is given to any process. Ensure that you map the process IDs (pid) from the output of this command to the
.wrapper
file within theC:\hdp\hadoop-1.1.0-SNAPSHOT\bin
directory.Note Ensure that you provide complete path to the Java executable, if Java
bin
directory's location is not set within yourPATH
.Detect Java heap allocation and usage: Use the following command to list Java heap information for a specific Java process. This data can be used to verify the heap settings and thus analyze if a particular Java process is reaching the threshold.
Execute the following command on the affected host machine:
jmap -heap $pid_of_Hadoop_process
For example, you should see output similar to the following:
C:\hdp\hadoop-1.1.0-SNAPSHOT>jmap -heap 2816 Attaching to process ID 2816, please wait... Debugger attached successfully. Server compiler detected. JVM version is 20.6-b01 using thread-local object allocation. Mark Sweep Compact GC Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 4294967296 (4096.0MB) NewSize = 1310720 (1.25MB) MaxNewSize = 17592186044415 MB OldSize = 5439488 (5.1875MB) NewRatio = 2 SurvivorRatio = 8 PermSize = 21757952 (20.75MB) MaxPermSize = 85983232 (82.0MB) Heap Usage: New Generation (Eden + 1 Survivor Space): capacity = 10158080 (9.6875MB) used = 4490248 (4.282234191894531MB) free = 5667832 (5.405265808105469MB) 44.203707787298384% used Eden Space: capacity = 9043968 (8.625MB) used = 4486304 (4.278472900390625MB) free = 4557664 (4.346527099609375MB) 49.60548290307971% used From Space: capacity = 1114112 (1.0625MB) used = 3944 (0.00376129150390625MB) free = 1110168 (1.0587387084960938MB) 0.35400390625% used To Space: capacity = 1114112 (1.0625MB) used = 0 (0.0MB) free = 1114112 (1.0625MB) 0.0% used tenured generation: capacity = 55971840 (53.37890625MB) used = 36822760 (35.116920471191406MB) free = 19149080 (18.261985778808594MB) 65.7880105424442% used Perm Generation: capacity = 21757952 (20.75MB) used = 20909696 (19.9410400390625MB) free = 848256 (0.8089599609375MB) 96.10139777861446% used
Show open files: Use Process Explorer to determine which processes are locked on a specific file. See Windows Sysinternals - Process Explorer for information on using Process explorer.
For example, you can use Process Explorer to troubleshoot the file lock issues that prevent a particular process from starting as shown in the screenshot below:
Verify well-formed XML:
Ensure that the Hadoop configuration files (for example,
hdfs-site.xml
, etc.) are well formed.You can either use Notepad++ or any third-party tools like Oxygen, XML Spy, etc. to validate the configuration files. Use the following instructions:
Open the XML file to be validated in Notepad++ and select
-> .Resolve validation errors, if any.
Detect AutoStart Programs: This information helps to isolate errors for a specific host machine.
For example, a potential port conflict between auto-started process and HDP processes, might prevent launch for one of the HDP components.
Ideally, the cluster administrator must have the information on auto-start programs handy. Use the following command to launch the GUI interface on the affected host machine:
C:\Windows\System32\msconfig.exe
Click
tab. Ensure that no startup items are enabled on the affected host machine.Collect list of all mounts on the machine: This information determines the drives that are actually mounted or available on the host machine for use. To troubleshoot disks capacity issues, use this command to determine if the system is violating any storage limitations.
Execute the following command on Powershell:
Get-Volume
You should see output similar to the following:
DriveLetter FileSystemLabel FileSystem DriveType HealthStatus SizeRemaining Size ----------- --------------- ---------- --------- ------------ ------------- ---- System Reserved NTFS Fixed Healthy 108.7 MB 350 MB C NTFS Fixed Healthy 10.74 GB 19.97 GB D HRM_SSS_X64FR... UDF CD-ROM Healthy 0 B 3.44 GB
Operating system messages Use Event Viewer to detect messages with a system or an application.
Event Viewer can determine if a machine was rebooted or shut down at a particular time. Use the logs to isolate issues for HDP services that were non-operational for a specific time.
Go to Control Panel -> All Control Panel Items -> Administrative Tools and click the Event Viewer icon.
Hardware/system information: Use this information to isolate hardware issues on the affected host machine.
Go to Control Panel -> All Control Panel Items -> Administrative Tools and click the System Information icon.
Network information: Use the following commands to troubleshoot network issues.
ipconfig: This command provides the IP address, validates if the network interfaces are available, and also validates if an IP address is bound to the interfaces. To troubleshoot communication issues between the host machines in your cluster, execute the following command on the affected host machine:
ipconfig
You should see output similar to the following:
Windows IP Configuration Ethernet adapter Ethernet 2: Connection-specific DNS Suffix . : Link-local IPv6 Address . . . . . : fe80::d153:501e:5df0:f0b9%14 IPv4 Address. . . . . . . . . . . : 192.168.56.103 Subnet Mask . . . . . . . . . . . : 255.255.255.0 Default Gateway . . . . . . . . . : 192.168.56.100 Ethernet adapter Ethernet: Connection-specific DNS Suffix . : test.tesst.com IPv4 Address. . . . . . . . . . . : 10.0.2.15 Subnet Mask . . . . . . . . . . . : 255.255.255.0 Default Gateway . . . . . . . . . : 10.0.2.2
netstat -ano: This command provides a list of used ports within the system. Use this command to troubleshoot launch issues with HDP master processes. Execute the following command on the host machine to resolve potential port conflict:
netstat -ano
You should see output similar to the following:
TCP 0.0.0.0:49154 0.0.0.0:0 LISTENING 752 TCP [::]:49154 [::]:0 LISTENING 752 UDP 0.0.0.0:500 *:* 752 UDP 0.0.0.0:3544 *:* 752 UDP 0.0.0.0:4500 *:* 752 UDP 10.0.2.15:50461 *:* 752 UDP [::]:500 *:* 752 UDP [::]:4500 *:* 752
Verify if firewall is enabled on the host machine: Go to Control Panel -> All Control Panel Items -> Windows Firewall .
You should see the following GUI interface: