Troubleshooting Issues with Workloads

This section describes some potential issues data scientists might encounter once the ML workspace is running workloads.

401 Error caused by incompatible Data Lake version🔗

The following error might occur due to an incompatible Data Lake version:

org.apache.ranger.raz.hook.s3.RazS3ClientCredentialsException: Exception in Raz Server; 
Check the raz server logs for more details, HttpStatus: 401

To avoid this issue, ensure that:

Data Lake and Runtime (server) version is 7.2.11 or higher.
Hadoop Runtime add-on (client) used in the CML session is 7.2.11 or higher.
Spark Runtime add-on version must be CDE 1.13 or higher.

Engines cannot be scheduled due to lack of CPU or memory🔗

A symptom of this is the following error message in the Workbench: "Unschedulable: No node in the cluster currently has enough CPU or memory to run the engine."

Either shut down some running sessions or jobs or provision more hosts for Cloudera Machine Learning.

Workbench prompt flashes red and does not take input🔗

The Workbench prompt flashing red indicates that the session is not currently ready to take input.

Cloudera Machine Learning does not currently support non-REPL interaction. One workaround is to skip the prompt using appropriate command-line arguments. Otherwise, consider using the terminal to answer interactive prompts.

PySpark jobs fail due to Python version mismatch🔗

Exception: Python in worker has different version 2.6 than that in driver 2.7, PySpark cannot run with different minor versions

One solution is to install the matching Python 2.7 version on all the cluster hosts. A better solution is to install the Anaconda parcel on all CDH cluster hosts. Cloudera Machine Learning Python engines will use the version of Python included in the Anaconda parcel which ensures Python versions between driver and workers will always match. Any library paths in workloads sent from drivers to workers will also match because Anaconda is present in the same location across all hosts. Once the parcel has been installed, set the PYSPARK_PYTHON environment variable in the Cloudera Machine Learning Admin dashboard.

Troubleshooting Issues with Workloads

401 Error caused by incompatible Data Lake version🔗

Engines cannot be scheduled due to lack of CPU or memory🔗

Workbench prompt flashes red and does not take input🔗

PySpark jobs fail due to Python version mismatch🔗

We want your opinion

How can we improve this page?