Mounting Additional Dependencies from the Host

As described previously, all Cloudera Data Science Workbench projects run within an engine.



By default, Cloudera Data Science Workbench automatically mounts the CDH parcel directory and client configuration for required services, such as HDFS, Spark, and YARN, into the engine. However, if users want to reference any additional files/folders on the host, site administrators need to explicitly load them into engine containers at runtime.

Limitation

If you use this option, you must self-manage the dependencies installed on the gateway hosts to ensure consistency across them all. Cloudera Data Science Workbench cannot manage or control the packages with this method. For example, you must manually ensure that version mismatches do not occur. Additionally, this method can lead to issues with experiment repeatability since CDSW does not control or keep track of the packages on the host. Contrast this with Installing Packages Directory Within Projects and Creating a Customized Engine, where Cloudera Data Science Workbench is aware of the packages and manages them.

For instructions on how to configure host mounts, see Configuring the Engine Environment. Note that the directories specified here will be available to all projects across the deployment.