Once the code snapshot is available, Cloudera Data Science Workbench creates a new
Docker image with a copy of the snapshot.
This new
image is based off the project's designated default engine image (configured at
Project Settings > Engine). The image environment can be customized by using environmental variables
and a build script that specifies which packages should
be included in the new image.
Environmental Variables
Previously (CDSW 1.7.1 and lower), the environment variables set at the site admin level
and project level did not automatically get pulled into the builds created for models and
experiments. They needed to be explicitly coded into the cdsw-build.sh file. With CDSW
1.7.2 and higher, experiments and models will automatically inherit these admin and
project-level environment variables.
Note that custom mounts or environment variables configured in cdsw.conf (such as NO_PROXY,
HTTP(S)_PROXY, etc.) are still not passed to the container builds for experiments and
models (even though they are applied to sessions, jobs, and deployed
models/experiments).
As part of the Docker build process, Cloudera Data Science Workbench
runs a build script called cdsw-build.sh file. You can use this file to customize the image environment by specifying any
dependencies to be installed for the code to run successfully. One advantage to this
approach is that you now have the flexibility to use different tools and libraries in each
consecutive training run. Just modify the build script as per your requirements each time
you need to test a new library or even different versions of a library.
The following sections demonstrate how to specify dependencies in Python
and R projects so that they are included in the build process for models and experiments.
Python 3
For Python, create a requirements.txt file in your project with a list of packages that
must be installed. For example:Figure 1. requirements.txt
beautifulsoup4==4.6.0
seaborn==0.7.1
Then, create a cdsw-build.sh file in your project and include the following
command to install the dependencies listed in requirements.txt. Figure 2. cdsw-build.sh
pip3 install -r requirements.txt
Now, when cdsw-build.sh
is run as part of the build process, it will install the beautifulsoup4 and seaborn packages to the new image
built for the experiment/model.
R
For R, create a script called install.R with the list of packages
that must be installed. For example:Figure 3. install.R
Then, create a cdsw-build.sh file in your project and include the following
command to run install.R.Figure 4. cdsw-build.sh
Rscript install.R
Now, when cdsw-build.sh
is run as part of the build process, it will install the tidyr and stringr packages to the new image
built for the experiment/model.
If you do not specify a build script, the build process will still run
to completion, but the Docker image will not have any additional dependencies installed. At
the end of the build process, the built image is then pushed to an internal Docker registry
so that it can be made available to all the Cloudera Data Science Workbench hosts. This
push is largely transparent to the end user.
This site uses cookies and related technologies, as described in our privacy policy, for purposes that may include site operation, analytics, enhanced user experience, or advertising. You may choose to consent to our use of these technologies, or