Debugging Issues with Experiments
This topic lists some common issues to watch out for during an experiment's build and execution process.
Experiment spends too long in Scheduling/Built stage
If your experiments are spending too long in any particular stage, check the resource consumption statistics for the cluster. When the cluster starts to run out of resources, often experiments (and other entities like jobs, models) will spend too long in the queue before they can be executed.
Resource consumption by experiments (and jobs, sessions) can be tracked by site administrators on thepage.
Experiment fails in the Build stage
During the build stage Cloudera Machine Learning creates a new Docker image for the experiment. You can track progress for this stage on each experiment's Build page. The build logs on this page should help point you in the right direction.
- Lack of execute permissions on the build script itself.
- Inability to reach the Python package index or R mirror when installing packages.
- Typo in the name of the build script
cdsw-build.sh). Note that the build process will only run a script called
cdsw-build.sh; not any other bash scripts from your project.
pip3to install packages in
cdsw-build.sh, but selecting a Python 2 kernel when you actually launch the experiment. Or vice versa.
Experiment fails in the Execute stage
Each experiment includes a Session page where you can track the output of the experiment as it executes. This is similar to the output you would see if you test the experiment in the workbench console. Any runtime errors will display on the Session page just as they would in an interactive session.