Configuring the Engine Environment
This section describes some of the ways you can configure engine environments to meet the requirements of your projects.
Install Additional Packages
For information on how to install any additional required packages and dependencies to your engine, see Installing Additional Packages.
For information on how environmental variables can be used to configure engine environments in Cloudera Machine Learning, see Engine Environment Variables.
Configuring Host Mounts
By default, Cloudera Machine Learning will automatically mount the CDH parcel directory and client configuration for required services such as HDFS, Spark, and YARN into each project's engine. However, if users want to reference any additional files/folders on the host, site administrators will need to configure them here so that they are loaded into engine containers at runtime. Note that the directories specified here will be available to all projects across the deployment.
To configure additional mounts, go toand add the paths to be mounted from the host to the Mounts section.
By default, mount points are loaded into engine containers with read-only permissions. CDSW 1.4.2 (and higher) also include a Write Access checkbox (see image) that you can use to enable read-write access for individual mounted directories. Note that these permissions will apply to all projects across the deployment.
When adding host mounts, try to be as generic as possible without mounting common system files. For example, if you want to add several files under
/etc/spark2-conf, you can simplify and mount the
/etc/spark2-confdirectory; but adding the parent
/etcmight prevent the engine from running.
As a general rule, do not mount full system directories that are already in use; such as
/etc. This also serves to avoid accidentally exposing confidential information in running sessions.
Do not add duplicate mount points. This will lead to sessions crashing in the workbench.