Troubleshooting issues with workloads
This section describes some potential issues data scientists might encounter once the Cloudera AI Workbench is running workloads.
401 Error caused by incompatible Data Lake version
org.apache.ranger.raz.hook.s3.RazS3ClientCredentialsException: Exception in Raz Server;
Check the raz server logs for more details, HttpStatus: 401
- Data Lake and Runtime (server) version is 7.2.11 or higher.
- Hadoop Runtime add-on (client) used in the Cloudera AI session is 7.2.11 or higher.
- Spark Runtime add-on version must be CDE 1.13 or higher.
Engines cannot be scheduled due to lack of CPU or memory
A symptom of this is the following error message in the Workbench: "Unschedulable: No node in the cluster currently has enough CPU or memory to run the engine."
Either shut down some running sessions or jobs or provision more hosts for Cloudera AI.
Workbench prompt flashes red and does not take input
The Workbench prompt flashing red indicates that the session is not currently ready to take input.
Cloudera AI does not currently support non-REPL interaction. One workaround is to skip the prompt using appropriate command-line arguments. Otherwise, consider using the terminal to answer interactive prompts.
PySpark jobs fail due to Python version mismatch
Exception: Python in worker has different version 2.6 than that in driver 2.7, PySpark cannot run with different minor versions
One solution is to install the matching Python 2.7 version on all the
cluster hosts. A better solution is to install the Anaconda parcel on all CDH cluster hosts.
Cloudera AI Python engines will use the version of Python included in the
Anaconda parcel which ensures Python versions between driver and workers will always match.
Any library paths in workloads sent from drivers to workers will also match because Anaconda
is present in the same location across all hosts. Once the parcel has been installed, set
the PYSPARK_PYTHON
environment variable in
the Cloudera AI Admin dashboard.