Solutions to Common Problems
The table below describes solutions to common cluster configuration problems.
Symptom | Reason | Solution |
---|---|---|
Cloudera Manager | ||
The Cloudera Manager service will not be running as it exited in an unusual
manner. Running The Cloudera Manager Server log file
|
Out of memory. | Examine the heap dump that the Cloudera Manager Server creates when it runs out
of memory. The heap dump file is created in the
/tmp directory, has file extension
.hprof and file permission of 600. Its owner
and group will be the owner and group of the Cloudera Manager
server process, normally
cloudera-scm:cloudera-scm . |
You are unable to start service on the Cloudera Manager server, that is,
service cloudera-scm-server start does not work
and there are errors in the log file located at
/var/log/cloudera-scm-server/cloudera-scm-server.log
|
The server has been disconnected from the database or the database has stopped responding or has shut down. | Go to /etc/cloudera-scm-server/db.properties and make sure the
database you are trying to connect to is listed there and has
been started. |
Logs include APPARENT DEADLOCK entries for c3p0. |
These deadlock messages are cause by the c3p0 process not making progress at the expected rate. This can indicate either that c3p0 is deadlocked or that its progress is slow enough to trigger these messages. In many cases, progress is occurring and these messages should not be seen as catastrophic. | There are a variety of ways to react to these log entries.
|
Starting Services | ||
After you click the Start button to start a service, the Finished
status does not display. This may not be merely a case of the status not getting displayed. It could be for a number of reasons such as network connectivity issues or subcommand failures. |
The host is disconnected from the Server, as will be indicated by missing heartbeats on the Hosts tab. |
|
Subcommands failed resulting in errors in the log file indicating that either the command timed out or the target port was already occupied |
|
|
After you click Start to start a service, the Finished status displays but there are error messages. The subcommands to start service components (such as JobTracker and one or more TaskTrackers) do not start. | A port specified in the Configuration tab of the service is already being used in your cluster. For example, the JobTracker port is in use by another process. | Enter an available port number in the port property (such as JobTracker port) in the Configuration tab of the service. |
There are incorrect directories specified in the Configuration tab of the service (such as the log directory). | Enter correct directories in the Configuration tab of the service. | |
Job is Failing | No space left on device. | One approach is to use a system monitoring tool such as Nagios to alert on the
disk space or quickly check disk space across all systems. If
you do not have Nagios or equivalent you can do the following to
determine the source of the space issue: In the JobTracker Web UI, drill down from the job, to the map or reduce, to the
task attempt details to see which TaskTracker the task
executed and failed on due to disk space. For example:
In the NameNode Web UI, inspect the % used column on the NameNode Live Nodes
page:
|
Send Test Alert and Diagnose SMTP Errors | ||
You have enabled sending alerts from the Cloudera Manager Admin Console,
however, Cloudera Manager does not seem to be sending any
alerts. Using the Send Test Alert link under shows success even though you do not receive an alert email. |
There is possibly a mismatch of protocol or port numbers between your mail server and the Alert Publisher. For example, if the Alert Publisher is sending alerts to SMTPS on port 465 and your mail servers are not configured for SMTPS, you wouldn't receive any alerts. | Use the following steps to make changes to the Alert Publisher configuration:
|