Apache Ambari Troubleshooting
Also available as:
PDF
loading table of contents...

Resolving General Problems

Problem: When installing HDP 2.3.0 or 2.3.2, YARN ATS fails to start.

If you install an HDP cluster using HDP 2.3.0 or HDP 2.3.2, the YARN ATS server will fail to start with the following error in the yarn log:

2015-12-09 22:56:41,816 FATAL 
applicationhistoryservice.ApplicationHistoryServer 
(ApplicationHistoryServer.java:launchAppHistoryServer (161)) - Error starting 
ApplicationHistoryServer java.lang.RuntimeException: 
java.lang.RuntimeException : java.lang.ClassNotFoundException : Class
org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore not found at
org.apache.hadoop.conf.Configuration.getClass (Configuration.java:2227)

Solution:

Update the YARN configuration to use the LevelDB store:

  1. In Ambari Web, browse to Services > YARN > Configs.

  2. Filter for the yarn.timeline-service.store-class property and set to org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore value.

  3. Save the configuration change and restart YARN.

Problem: After upgrading to Ambari 2.2, you receive File Does Not Exist alerts.

After upgrading to Ambari 2.2, you receive alerts for "DataNode Unmounted Data Dir" that the /var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist file does not exist. The hadoop-env/dfs.datanode.data.dir.mount.file configuration property is no longer customizable from Ambari. The original default value of /etc/hadoop/conf/dfs_data_dir_mount.hist is now /var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist, which is not customizable. On Ambari Agent upgrade, Ambari will automatically move the file from /etc/hadoop/conf/dfs_data_dir_mount.hist to /var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist. If you have not modified this configuration property, no action is required.

Solution:

If you had previously modified the hadoop-env/dfs.datanode.data.dir.mount.file value to a custom location, after upgrading to Ambari 2.2, you must restart your DataNodes for the file to be written to be the new location.

During Enable Kerberos, the Check Kerberos operation fails.

When enabling Kerberos using the wizard, the Check Kerberos operation fails. In /var/log/ambari-server/ambari-server.log, you see a message: 02:45:44,490 WARN [qtp567239306-238] MITKerberosOperationHandler:384 - Failed to execute kadmin:

Solution 1:

Check that NTP is running and confirm your hosts and the KDC times are in sync. A time skew as little as 5 minutes can cause Kerberos authentication to fail.

Solution 2: (on RHEL/CentOS/Oracle Linux)

Check that the Kerberos Admin principal being used has the necessary KDC ACL rights as set in /var/kerberos/krb5kdc/kadm5.acl .

Problem: Hive developers may encounter an exception error message during Hive Service Check

MySQL is the default database used by the Hive metastore. Depending on several factors, such as the version and configuration of MySQL, a Hive developer may see an exception message similar to the following one:

An exception was thrown while adding/validating classes) : Specified key was too long; max key length is 767 bytes

Solution

Administrators can resolve this issue by altering the Hive metastore database to use the Latin1 character set, as shown in the following example: mysql> ALTER DATABASE <metastore.database.name> character set latin1;

Problem: API calls for PUT, POST, DELETE respond with a "400 - Bad Request"

When attempting to perform a REST API call, you receive a 400 error response. REST API calls require the "X-Requested-By" header.

Solution

Starting with Ambari 1.4.2, you must include the "X-Requested-By" header with the REST API calls.

For example, if using curl, include the -H "X-Requested-By: ambari" option. curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://<ambari-host>:8080/api/v1/hosts/host1

Problem: Ambari is checking disk full on non-local disks; causing a high number of auto-mounted home directories

When Ambari issues it's check to detect local disk capacity and use for each Ambari Agent, it uses df by default instead of df -l to only check local disks. If using NFS auto-mounted home directories, this can lead to a high number of home directories being mounted on each host; causing shutdown delays and disk capacity check delays.

Solution:

On the Ambari Server, edit the /etc/ambari-server/conf/ambari.properties and add the following property to only check locally mounted devices.

agent.check.remote.mounts=false

Problem: Ambari Web shows Storm summary values as N/A in a Kerberized cluster

With a Kerberos-enabled cluster that includes Storm, in Ambari Web > Services > Storm, the Summary values for Slots, Tasks, Executors and Topologies show as "n/a". Ambari Server log also includes the following ERROR:

24 Mar 2015 13:32:41,
288 ERROR [pool-2-thread-362] 
AppCookieManager:122 - 
SPNego authentication failed,
cannot get hadoop.auth cookie for URL: 
http: //c6402.ambari.apache.org:8744/api/
v1/topology/summary?field=topologies

Solution:

When Kerberos is enabled, Storm API requires SPNEGO authentication. Refer to the Ambari Security Guide to Set Up Ambari for Kerberos to enable Ambari to authenticate against the Storm API via SPNEGO.

Problem: kadmin running Ambari Server as non-root, cannot open log file.

When running Ambari Server as non-root, when enabling Kerberos, if kadmin fails to authenticate, you will see the following error in ambari-server.log if Ambari cannot access the kadmind.log.

STDERR: Couldn't open log file /var/log/kadmind.log: Permission denied 
kadmin: GSS-API (or Kerberos) error while initializing kadmin interface

Solution:

Be sure the user that Ambari Server is configured to run has permissions to write to the kadmind.log.

Problem: After changing NameNode RPC port, Ambari shows both NameNodes as standby.

If you have enabled NameNode HA and change the NameNode RPC ports (by customizing the dfs.namenode.servicerpc-address property), Ambari will show both NameNodes as standby.

Solution:

When modifying the NameNode RPC port (dfs.namenode.servicerpc-address) after enabling NameNode HA, you need to format ZKFC to make sure that the config data in ZooKeeper is refreshed. Run the following command to format ZKFC znode:

su - <hdfs-user> -c 'hdfs zkfc -formatZK'