4. Resolving Cluster Deployment Problems

Try the recommended solution for each of the following problems:

 4.1. Problem: Trouble Starting Ambari on System Reboot

If you reboot your cluster, you must restart the Ambari Server and all the Ambari Agents manually.

 4.1.1. Solution:

Log in to each machine in your cluster separately:

  1. On the Ambari Server host machine:

    ambari-server start

  2. On each host in your cluster:

    ambari-agent start

 4.2. Problem: Metrics and Host information display incorrectly in Ambari Web

Charts appear incorrectly or not at all despite Host health status is displayed incorrectly.

 4.2.1. Solution:

All the hosts in your cluster and the machine from which you browse to Ambari Web must be in sync with each other. The easiest way to assure this is to enable NTP.

 4.3. Problem: On SUSE 11 Ambari Agent crashes within the first 24 hours

SUSE 11 ships with Python version 2.6.0-8.12.2 which contains a known defect that causes this crash.

 4.3.1. Solution:

Upgrade to Python version 2.6.8-0.15.1.

 4.4. Problem: Attempting to Start HBase REST server causes either REST server or Ambari Web to fail

As an option you can start the HBase REST server manually after the install process is complete. It can be started on any host that has the HBase Master or the Region Server installed. If you install the REST server on the same host as the Ambari server, the http ports will conflict.

 4.4.1. Solution

In starting the REST server, use the -p option to set a custom port. Use the following command to start the REST server. /usr/lib/hbase/bin/hbase-daemon.sh start rest -p <custom_port_number>

 4.5. Problem: Multiple Ambari Agent processes are running, causing re-register

On a cluster host ps aux | grep ambari-agent shows more than one agent process running. This causes Ambari Server to get incorrect ids from the host and forces Agent to restart and re-register.

 4.5.1. Solution

On the affected host, kill the processes and restart.

  1. Kill the Agent processes and remove the Agent PID files found here: /var/run/ambari-agent/ambari-agent.pid.

  2. Restart the Agent process:

    ambari-agent start

 4.6. Problem: Some graphs do not show a complete hour of data until the cluster has been running for an hour

When you start a cluster for the first time, some graphs, such as Services View > HDFS and Services View > MapReduce, do not plot a complete hour of data. Instead, they show data only for the length of time the service has been running. Other graphs display the run of a complete hour.

 4.6.1. Solution

Let the cluster run. After an hour all graphs will show a complete hour of data.

 4.7. Problem: Ambari stops MySQL database during deployment, causing Ambari Server to crash.

The Hive Service uses MySQL Server by default. If you choose MySQL server as the database on the Ambari Server host as the managed server for Hive, Ambari stops this database during deployment and crashes.

 4.7.1. Solution

If you plan to use the default MySQL Server setup for Hive and use MySQL Server for Ambari - make sure that the two MySQL Server instances are different.

If you plan to use the same MySQL Server for Hive and Ambari - make sure to choose the existing database option for Hive.

 4.8. Problem: Cluster Install Fails with Groupmod Error

The cluster fails to install with an error related to running groupmod. This can occur in environments where groups are managed in LDAP, and not on local Linux machines. You may see an error message similar to the following one:

Fail: Execution of 'groupmod hadoop' returned 10. groupmod: group 'hadoop' does not exist in /etc/group

 4.8.1. Solution

When installing the cluster using the Cluster Installer Wizard, at the Customize Services step, select the Misc tab and choose the Skip group modifications during install option.

 4.9. Problem: Host registration fails during Agent bootstrap on SLES due to timeout.

When using SLES and performing host registration using SSH, the Agent bootstrap may fail due to timeout when running the setupAgent.py script. The host on which the timeout occurs will show the following process hanging:

c6401.ambari.apache.org:/etc/ # ps -ef | grep zypper root 18318 18317 5 03:15 pts/1 00:00:00 zypper -q search -s --match-exact ambari-agent

 4.9.1. Solution

  1. If you have a repository registered that is prompting to accept keys, via user interaction, you may see the hang and timeout. In this case, run zypper refresh and confirm all repository keys are accepted for the zypper command to work without user interaction.

  2. Another alternative is to perform manual Agent setup and not use SSH for host registration. This option does not require that Ambari call zypper without user interaction.

 4.10. Problem: Host Check Fails if Transparent Huge Pages (THP) is not disabled.

When installing Ambari on RHEL/CentOS 6 using the Cluster Installer Wizard at the Host Checks step, one or more host checks may fail if you have not disabled Transparent Huge Pages on all hosts.

Host Checks will warn you when a failure occurs.

 4.10.1. Solution

Disable THP. On all hosts,

  1. Add the following command to your /etc/rc.local file:

    if test -f /sys/kernel/mm/transparent_hugepage/enabled; then echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled fi if test -f /sys/kernel/mm/transparent_hugepage/defrag; then echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag fi

  2. To confirm, reboot the host then run the following command:

    $ cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never]


loading table of contents...