Troubleshooting Issues with Workloads
This section describes some potential issues data scientists might encounter once the application is running workloads.
404 error in Workbench after starting an engine
This is typically caused because a wildcard DNS subdomain was not set up before installation. While the application will largely work, the engine consoles are served on subdomains and will not be routed correctly unless a wildcard DNS entry pointing to the master host is properly configured. You might need to wait 30-60 minutes until the DNS entries propagate. For instructions, see Set Up a Wildcard DNS Subdomain.
Engines cannot be scheduled due to lack of CPU or memory
A symptom of this is the following error message in the Workbench: "Unschedulable: No node in the cluster currently has enough CPU or memory to run the engine."
Either shut down some running sessions or jobs or provision more hosts for Cloudera Data Science Workbench.
Workbench prompt flashes red and does not take input
The Workbench prompt flashing red indicates that the session is not currently ready to take input.
Cloudera Data Science Workbench does not currently support non-REPL interaction. One workaround is to skip the prompt using appropriate command-line arguments. Otherwise, consider using the terminal to answer interactive prompts.
PySpark jobs fail due to HDFS permission errors
: org.apache.hadoop.security.AccessControlException: Permission denied: user=alice, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
(Required for CDH 5 and CDH 6) To be able to use Spark 2, each user must have their own /home directory in HDFS. If you sign in to Hue first, these directories will automatically be created for you. Alternatively, you can have cluster administrators create these directories.
hdfs dfs -mkdir /user/<username> hdfs dfs -chown <username>:<username> /user/<username>
PySpark jobs fail due to Python version mismatch
Exception: Python in worker has different version 2.6 than that in driver 2.7, PySpark cannot run with different minor versions
One solution is to install the matching Python 2.7 version on all the
cluster hosts. Another, more recommended solution is to install the Anaconda parcel on all
CDH cluster hosts. Cloudera Data Science Workbench Python engines will use the version of
Python included in the Anaconda parcel which ensures Python versions between driver and
workers will always match. Any library paths in workloads sent from drivers to workers will
also match because Anaconda is present in the same location across all hosts. Once the
parcel has been installed, set the
PYSPARK_PYTHON environment variable in the Cloudera Data Science Workbench Admin
dashboard. Alternatively, you can use Cloudera Manager to set the path.
Jobs fail due to incorrect JAVA_HOME on HDP
hdfscommands, and jobs fail with an error similar to the following message:
ERROR: JAVA_HOME /usr/lib/jvm/java does not exist.
JAVA_HOMEpath you configure for Cloudera Data Science Workbench in cdsw.conf must match the
JAVA_HOMEconfigured by hadoop-env.sh for the HDP cluster. After you update
JAVA_HOMEin cdsw.conf, you must restart Cloudera Data Science Workbench. For more information, see Changes to cdsw.conf.
The user_events table is growing in size and affecting performance
user_eventstable is used to monitor and audit user events. It can grown in size in long running deployments and can decrease performance. To clean the table manually:
SSH to the Cloudera Data Science Workbench Master host and log in as
Get the name of the database pod:
kubectl get pods -l role=dbThe command returns information similar to the following example:
NAME READY STATUS RESTARTS AGE db-6d56584f76-phn2f 1/1 Running 0 4h46m
Enter the following command to log into the database as the
kubectl exec -it <database pod> -- psql -U sense
- Run the following query to get the number of rows from the
select count(id) from user_events;
- Delete the records older than 30 days by running the following
delete from user_events where created_at < NOW() - INTERVAL '30 days';