Troubleshooting Issues with Workloads
This section describes some potential issues data scientists might encounter once the ML workspace is running workloads.
401 Error caused by incompatible RAZ version
org.apache.ranger.raz.hook.s3.RazS3ClientCredentialsException: Exception in Raz Server; Check the raz server logs for more details, HttpStatus: 401
Only RAZ versions 7.2.11 or higher work with CML. Upgrade the RAZ version to avoid this error.
Engines cannot be scheduled due to lack of CPU or memory
A symptom of this is the following error message in the Workbench: "Unschedulable: No node in the cluster currently has enough CPU or memory to run the engine."
Either shut down some running sessions or jobs or provision more hosts for Cloudera Machine Learning.
Workbench prompt flashes red and does not take input
The Workbench prompt flashing red indicates that the session is not currently ready to take input.
Cloudera Machine Learning does not currently support non-REPL interaction. One workaround is to skip the prompt using appropriate command-line arguments. Otherwise, consider using the terminal to answer interactive prompts.
PySpark jobs fail due to Python version mismatch
Exception: Python in worker has different version 2.6 than that in driver 2.7, PySpark cannot run with different minor versions
One solution is to install the matching Python 2.7 version on all the
cluster hosts. A better solution is to install the Anaconda parcel on all CDH cluster hosts.
Cloudera Machine Learning Python engines will use the version of Python included in the
Anaconda parcel which ensures Python versions between driver and workers will always match.
Any library paths in workloads sent from drivers to workers will also match because Anaconda
is present in the same location across all hosts. Once the parcel has been installed, set
PYSPARK_PYTHON environment variable in
the Cloudera Machine Learning Admin dashboard.