Resolving Cluster Deployment Problems
Try the recommended solution for each of the following problems.
Problem: Trouble Starting Ambari on System Reboot
If you reboot your cluster, you must restart the Ambari Server and all the Ambari Agents manually.
Solution:
Log in to each machine in your cluster separately:
On the Ambari Server host machine:
ambari-server start
On each host in your cluster:
ambari-agent start
Problem: Metrics and Host information display incorrectly in Ambari Web
Charts appear incorrectly or not at all despite Host health status is displayed incorrectly.
Solution:
All the hosts in your cluster and the machine from which you browse to Ambari Web must be in sync with each other. The easiest way to assure this is to enable NTP.
Problem: On SUSE 11 Ambari Agent crashes within the first 24 hours
SUSE 11 ships with Python version 2.6.0-8.12.2 which contains a known defect that causes this crash.
Solution:
Upgrade to Python version 2.6.8-0.15.1.
Problem: Attempting to Start HBase REST server causes either REST server or Ambari Web to fail
As an option you can start the HBase REST server manually after the install process is complete. It can be started on any host that has the HBase Master or the Region Server installed. If you install the REST server on the same host as the Ambari server, the http ports will conflict.
Solution
In starting the REST server, use the -p option to set a custom port. Use the following
command to start the REST server. /usr/lib/hbase/bin/hbase-daemon.sh start rest
-p <custom_port_number>
Problem: Multiple Ambari Agent processes are running, causing re-register
On a cluster host ps aux | grep ambari-agent
shows more than one agent process running. This causes Ambari Server to get incorrect ids from the host and forces Agent to restart and re-register.
Solution
On the affected host, kill the processes and restart.
Kill the Agent processes and remove the Agent PID files found here:
/var/run/ambari-agent/ambari-agent.pid
.Restart the Agent process:
ambari-agent start
Problem: Ambari stops MySQL database during deployment, causing Ambari Server to crash
The Hive Service uses MySQL Server by default. If you choose MySQL server as the database on the Ambari Server host as the managed server for Hive, Ambari stops this database during deployment and crashes.
Solution
If you plan to use the default MySQL Server setup for Hive and use MySQL Server for Ambari - make sure that the two MySQL Server instances are different.
If you plan to use the same MySQL Server for Hive and Ambari - make sure to choose the existing database option for Hive.
Problem: Cluster Install Fails with Groupmod Error
The cluster fails to install with an error related to running groupmod
. This can occur in environments where groups are managed in LDAP, and not on local Linux machines. You may see an error message similar to the following one:
Fail: Execution of 'groupmod hadoop' returned 10. groupmod: group 'hadoop' does not exist in /etc/group
Solution
When installing the cluster using the Cluster Installer Wizard, at the
Customize Services
step, select the Misc
tab and
choose the Skip group modifications during install
option.
Problem: Host registration fails during Agent bootstrap on SLES due to timeout
When using SLES and performing host registration using SSH, the Agent bootstrap may fail due to timeout when running the setupAgent.py
script. The host on which the timeout occurs will show the following process hanging:
c6401.ambari.apache.org:/etc/
# ps -ef | grep zypper
root 18318 18317 5 03:15 pts/1 00:00:00 zypper -q search -s --match-exact
ambari-agent
Solution
If you have a repository registered that is prompting to accept keys, via user interaction, you may see the hang and timeout. In this case, run
zypper refresh
and confirm all repository keys are accepted for the zypper command to work without user interaction.Another alternative is to perform manual Agent setup and not use SSH for host registration. This option does not require that Ambari call zypper without user interaction.
Problem: Host Check Fails if Transparent Huge Pages (THP) is not disabled
When installing Ambari on RHEL/CentOS 6 using the Cluster Installer Wizard at the Host Checks step, one or more host checks may fail if you have not disabled Transparent Huge Pages on all hosts.
Host Checks will warn you when a failure occurs.
Solution
Disable THP. On all hosts,
Add the following command to your
/etc/rc.local
file:if test -f /sys/kernel/mm/transparent_hugepage/enabled; then echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled fi if test -f /sys/kernel/mm/transparent_hugepage/defrag; then echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag fi
To confirm, reboot the host then run the following command:
$ cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never]
Problem: DataNode Fails to Install on RHEL/CentOS 7
During cluster install, DataNode fails to install with the following error:
resource_management.core.exceptions.
Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install snappy-devel' returned 1.
Error: Package: snappy-devel-1.0.5-1.el6.x86_64 (HDP-UTILS-1.1.0.20)
Requires: snappy(x86-64) = 1.0.5-1.el6
Installed: snappy-1.1.0-3.el7.x86_64 (@anaconda/7.1)
snappy(x86-64) = 1.1.0-3.el7
Available: snappy-1.0.5-1.el6.x86_64 (HDP-UTILS-1.1.0.20)
snappy(x86-64) = 1.0.5-1.el6
Solution:
Hadoop requires the snappy-devel package that is a lower version that what is on the machine already. Run the following on the host and retry.
yum remove snappy
yum install snappy-devel
Problem: When running Ambari Server as non-root, kadmin couldn't open log file
When running Ambari Server as non-root, when enabling Kerberos, if kadmin fails to authenticate, you will see the following error in ambari-server.log if Ambari cannot access the kadmind.log.
STDERR: Couldn't open log file /var/log/kadmind.log: Permission denied
kadmin: GSS-API (or Kerberos) error while initializing kadmin interface
Solution:
To avoid this error, be sure the kadmind.log file has 644 permissions.
Problem: Adding client-only services does not automatically install component dependencies.
When adding client-only services to a cluster (using Add Service), Ambari does not automatically install dependent client components with the newly added clients.
Solution:
On hosts where client components need to be installed, browse to Hosts and to the Host Details page. Click + Add and select the client components to install on that host.
Problem: Automatic Agent Registration with SSH fails for a non-root configuration
When using an Agent non-root configuration, if you attempt to register hosts automatically using SSH, the Agent registration will fail.
Solution:
The option to automatically register hosts with SSH is not supported when using a Agent non-root configuration. You must manually register the Agents.
Problem: Ambari Server will not start with “DB configs consistency check failed.”
Solution:
On Ambari Server start, Ambari runs a database consistency check looking for issues. If any issues are found, Ambari Server start will abort and a message will be printed to console “DB configs consistency check failed.” You can force Ambari Server to start by skipping this check with the following option:
ambari-server start --skip-database-check
Refer to Start the Ambari Server for more information.
Problem: The Ambari Metrics Monitor fails to install on SLES 11.
During Ambari Metrics install on SLES 11, the Ambari Metrics Monitor fails to install due to a package dependency.
2015-05-28 17:38:39,869 - Error while executing command 'install': Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 214, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 32, in install self.install_packages(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 289,in install_packages Package(name) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 43, in action_install self.install_package(package_name, self.resource.use_repos) File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/zypper.py", line 72, in install_package shell.checked_call(cmd, sudo=True, logoutput=self.get_logoutput()) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner return function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 82, in checked_call return _call(command, logoutput, True, cwd, env, preexec_fn, user, wait_for_finish, timeout, path, sudo, on_new_line) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 199, in _call raise Fail(err_msg)Fail: Execution of '/usr/bin/zypper --quiet install --auto-agree-with-licenses --no-confirm ambari-metrics-monitor' returned 4. Problem: nothing provides python-devel needed by ambari-metrics-monitor-2.0.0-151.x86_64
Solution:
The Ambari Metrics Monitor requires python-devel package, which is part of the SLES 11 SDK. Refer to this document https://www.novell.com/support/kb/doc.php?id=7015337 for information on installing the SDK. Perform the SDK install on all hosts.