Known ML Runtimes issues.

  • Code completion in the Workbench editor does not work when using Runtimes. It does work in the console on the right hand side, and it does work in both the editor and console when using Legacy Engines.
  • Limitations for Spark support for ML Runtimes:
    • ML Runtimes running Python 3.8 kernel does not support running Spark.
    • ML Runtimes on CDH 7.x does not support running Spark.

      Cloudera Bug: DSE-13916

  • ML Runtimes do not support the CLI interface (cdswctl command) for the current release.

    Cloudera Bug: DSE-13699

  • In order to use Spark with ML Runtimes on Cloudera Data Science Workbench, prior to using ML Runtimes the first time, you must install py4j. As part of the Session, run the following:
    run pip install py4j
  • Jupyter Notebook sessions in legacy engine:8-engine:13 do not exit after IDLE_MAXIMUM_MINUTES of inactivity. They will run until SESSION_MAXIMUM_MINUTES (which is seven days by default). .

    Workaround:You can change the configuration of your cluster to apply the fix for this issue. Change the editor command for Jupyter Notebook in every engine that uses it to the following:

    NOTEBOOK_TIMEOUT_SECONDS=$(python3 -c "print(${IDLE_MAXIMUM_MINUTES}*60)") /usr/local/bin/jupyter notebook --no-browser --ip= --port=${CDSW_APP_PORT} --NotebookApp.token= --NotebookApp.allow_remote_access=True --NotebookApp.quit_button=False --log-level=ERROR --NotebookApp.shutdown_no_activity_timeout=300 --MappingKernelManager.cull_idle_timeout=${NOTEBOOK_TIMEOUT_SECONDS} -- TerminalManager.cull_inactive_timeout=${NOTEBOOK_TIMEOUT_SECONDS} --MappingKernelManager.cull_interval=60 --TerminalManager.cull_interval=60 --MappingKernelManager.cull_connected=True 
    This does the following:
    • Kills each running notebook after IDLE_MAXIMUM_MINUTES of inactivity
    • Kills the CDSW/CML session in which Jupyter is running after 5 minutes with no notebooks

    Cloudera Bug: DSE-13741, DSE-6651