4. Resolving Cluster Upgrade Problems
Try the recommended solution for each of the following problems.
4.1. Problem: Versions tab does not show in Ambari Web.
After performing an upgrade from HDP 2.1 and restarting Ambari Server and the Agents, if you browse to Admin > Stack and Versions in Ambari Web, the Versions tab does not display.
4.1.1. Solution:
Give all the Agent hosts in the cluster a chance connect to Ambari Server by waiting for Ambari to show the Agent heartbeats as green and then refresh your browser.
4.2. Problem: YARN Service Checks Fail and ResourceManager fails to start
When upgrading from HDP 2.2 -> 2.3, if your cluster contains
yarn.scheduler.capacity.root.accessible-node-labels.default.capacity
and
yarn.scheduler.capacity.root.accessible-node-labels.default.maximum-capacity
properties, these values must be valid before upgrading to HDP 2.3. During HDP upgrade, on
ResourceManager start, if these values are invalid, you will get the following error:
Illegal capacity of 0.0 for children of queue root for label=defaultUpdate YARN Configuration Properties for HDP 2.3
4.2.1. Solution:
From Ambari Web, browse to Services > YARN > Configs. On the Advanced tab, delete the following properties from capacity-scheduler:
yarn.scheduler.capacity.root.accessible-node-labels.default.capacity
yarn.scheduler.capacity.root.accessible-node-labels.default.maximum-capacity
Also, be sure these properties are valid (or not included) in your Blueprints when you create clusters.
4.3. Problem: After HDP 2.2 -> 2.3 Manual Upgrade, Ambari alerts service flags Kafka Broker down.
After performing a manual upgrade from HDP 2.2 - 2.3, Ambari display alerts for Kafka Brokers.
4.3.1. Solution:
During upgrade, Ambari adds the listeners=PLAINTEXT://localhost:6667 to /etc/kafka/conf/server.properties this cause Kafka to listen on localhost at port 6667 and Ambari alerts service will not be able to reach Kafka broker and flags it as service down.users can run following command. It is not until Ambari set-current is run (per the upgrade instructions) that this value is corrected.
4.4. Problem: Ranger Admin UI does not function after upgrading HDP from 2.2 to 2.3
Ranger upgrade patches may fail to complete, during an upgrade of the HDP Stack from 2.2 to 2.3, causing the Ranger Admin UI to not funtion correctly after the upgrade.
4.4.1. Solution: Run the DB and Java patch scripts manually, then Retry Upgrading Ranger.
On the Rolling Upgrade dialog, stdout tab, review the Ambari log files to determine which patch caused a timeout. For example, the following image shows a timeout during SQL patch execution.
Based on your review, determine whether DB or Java patch scripts (or both) have failed to complete.
Log in to the Ranger Admin host.
On the Ranger Admin host, in
/usr/hdp/2.3.x.y-z/ranger-admin/
, run the following commmands:For DB patch failure:
python db_setup.py
For JAVA patch failure:
python db_setup.py -javapatch
Confirm that all patches complete successfully.
In Ambari Web, in the Rolling Upgrade dialog at the Ranger Paused step, click Retry.
4.5. Problem: Rolling or Express Upgrade fails while stopping HBase RegionServers
When upgrading to HDP 2.3.2 on Ubuntu or Debian, and you are using custom service accounts, the upgrade (Rolling or Express) fails while stopping HBase RegionServers.
4.5.1. Solution:
If you perform an upgrade (Rolling or Express) to HDP 2.3.2, the HBase RegionServers will fail when stopping if you have custom service accounts configured. This occurs because when the HDP 2.3.2 packages are installed, the ownership of the pid directory for HBase is changed. To correct this situation, on the host, change the pid directory ownership for your custom service accounts. For example: if you custom service account for HBase is "cstm-hbase", change the ownership as described below and proceed with the upgrade.
chown cstm-hbase:hbase -R /var/run/hbase/
4.6. Problem: Atlas service fails to upgrade when performing a rolling upgrade on a secure, SLES11 cluster
When performing a rolling upgrade of a secure cluster running Atlas on SLES11, you may see the following error, (indicating that Atlas has not upgraded successfully:
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh /usr/bin/hdp-select set all `ambari-python-wrap /usr/bin/hdp-select versions | grep ^2.5.0.0-1238 | tail -1`' returned 1. symlink target /usr/hdp/current/atlas-client for atlas already exists and it is not a symlink.
4.6.1. Solution:
For all environments, we recommend removing Atlas from the cluster before a performing a rolling upgrade, as described here.
Specific upgrade steps for a secure HDP 2.4 cluster running Atlas on SLES 11 sp3:
Delete the Atlas service via the Ambari API2.
Upgrade Ambari to version 2.4.03.
Upgrade Ambari to version 2.4.04.
Install HDP-2.5.x bits.
Perform an Express Upgrade.
Upgrade the stack.
Reinstall Atlas.
Restart all impacted services using Services > Restart All.