Troubleshooting Upgrades
Cloudera Runtime upgrade failure
"Access denied" in install or update wizard
"Access denied" in install or update wizard during database configuration for Activity Monitor or Reports Manager.
Possible Reasons
Hostname mapping or permissions are not set up correctly.
Possible Solutions
- For hostname configuration, see Configure Network Names.
- For permissions, make sure the values you enter into the wizard
match those you used when you configured the databases. The value
you enter into the wizard as the database hostname must match
the value you entered for the hostname (if any) when you configured the
database.
For example, if you had entered the following when you created the database
grant all on activity_monitor.* TO 'amon_user'@'myhost1.myco.com' IDENTIFIED BY 'amon_password';
the value you enter here for the database hostname must be
myhost1.myco.com
. If you did not specify a host, or used a wildcard to allow access from any host, you can enter either the fully qualified domain name (FQDN), orlocalhost
. For example, if you enteredgrant all on activity_monitor.* TO 'amon_user'@'%' IDENTIFIED BY 'amon_password';
the value you enter for the database hostname can be either the FQDN or
localhost
.
The Hive on Tez service and the RangerAdmin role of the Ranger service kept going down
Possible Reasons
Both issues were caused by their respective Java heap size configuration being very low.
Possible Solutions
- For Hive on Tez: Look up the
hiveserver2_java_heapsize
config key under Hive on Tez > Configuration . Later, change its value from 256 MiB to 2 GiB. - For Ranger: Look up the
ranger_admin_max_heap_size
config key under Ranger > Configuration. Later, change its value from 1 GiB to 4 GiB.
Both the services must be restarted.
Cluster hosts do not appear
Some cluster hosts do not appear when you click Find Hosts in install or update wizard.
Possible Reasons
You might have network connectivity problems.
Possible Solutions
- Make sure all cluster hosts have SSH port 22 open.
- Check other common causes of loss of connectivity such as firewalls and interference from SELinux.
Cannot start services after upgrade
You have upgraded the Cloudera Manager Server, but now cannot start services.
Possible Reasons
You might have mismatched versions of the Cloudera Manager Server and Agents.
Possible Solutions
Make sure you have upgraded the Cloudera Manager Agents on all hosts. (The previous version of the Agents will heartbeat with the new version of the Server, but you cannot start HDFS and MapReduce with this combination.)
HDFS DataNodes fail to start
After upgrading, HDFS DataNodes fail to start with exception:
Exception in secureMainjava.lang.RuntimeException: Cannot start datanode because the configured max locked memory size (dfs.datanode.max.locked.memory) of 4294967296 bytes is more than the datanode's available RLIMIT_MEMLOCK ulimit of 65536 bytes.
Possible Reasons
HDFS caching, which is enabled by default in CDH 5 and higher, requires new memlock functionality from Cloudera Manager Agents.
Possible Solutions
Do the following:
- Stop all CDH and managed services.
- On all hosts with Cloudera Manager Agents, hard-restart the
Agents. Before performing this step, ensure you understand the
semantics of the
hard_restart
command by reading Cloudera Manager Agents.- RHEL 7, SLES 12, Ubuntu 18.04 and higher
-
sudo systemctl stop cloudera-scm-supervisord.service sudo systemctl restart cloudera-scm-agent
- Start all services.
Cloudera services fail to start
Cloudera services fail to start.
Possible Reasons
Java might not be installed or might be installed at a custom location.
Possible Solutions
See Configuring a Custom Java Home Location for more information on resolving this issue.
Host Inspector Fails
If you see the following message in the Host Inspector:
There are mismatched versions across the system, which will cause failures. See below for details on which hosts are running what versions of components.
When looking at the results, some hosts report Supervisord vX.X.X, while others report X.X.X-cmY.Y.Y (where X and Y are version numbers). During the upgrade, an old file on the hosts may cause the Host Inspector to indicate mismatched Supervisord versions.
This issue occurs because these hosts have a file on them at /var/run/cloudera-scm-agent/supervisor/__STARTING_CM_VERSION__ that contains a string for the older version of Cloudera Manager.
- Remove or rename the
/var/run/cloudera-scm-agent/supervisor/__STARTING_CM_VERSION__
file - Perform a hard restart of the
agents:
sudo systemctl stop cloudera-scm-supervisord.service sudo systemctl start cloudera-scm-agent
- Run the Host inspector again. It should pass without the warning.