Legacy Engines

Cloudera Data Science Workbench engines are responsible for running R, Python, and Scala code written by users and intermediating access to the CDH cluster.

You can think of an engine as a virtual machine, customized to have all the necessary dependencies to access the CDH cluster while keeping each project’s environment entirely isolated. To ensure that every engine has access to the parcels and client configuration managed by the Cloudera Manager Agent, a number of folders are mounted from the host into the container environment. This includes the parcel path -/opt/cloudera, client configuration, as well as the host’s JAVA_HOME.

Starting with the current CDSW release, Engines are deprecated. We recommend using ML Runtimes for all new projects from now on. You can also migrate existing Engine-based projects to ML Runtimes. Engines are still supported, but new features will only be available for ML Runtimes.

Known Issues

  • Apache Phoenix requires additional configuration to run commands successfully from within Cloudera Data Science Workbench engines (sessions, jobs, experiments, models).

    Workaround

    Explicitly set HBASE_CONF_PATH to a valid path before running Phoenix commands from engines.
    export HBASE_CONF_PATH=/usr/hdp/hbase/<hdp_version>/0/
  • Installing ipywidgets or a Jupyter notebook into a project can cause Python engines to stop responding due to an unexpected configuration. The issue can be resolved by deleting the installed libraries from the R engine terminal.