Once the code snapshot is available, Cloudera Machine Learning
creates a new Docker image with a copy of the snapshot.
The new image is based off the project's designated default engine image
(configured at Project
Settings > Engine). The
image environment can be customized by using environmental variables and a
build script that specifies which packages should be included in
the new image.
Environmental Variables🔗
Both models and experiments inherit environmental variables from their
parent project. Furthermore, in case of models, you can specify
environment variables for each model build. In case of conflicts, the
variables specified per-build will override any values inherited from
the project.
For more information, see Engine Environment
Variables.
Build Script - cdsw-build.sh🔗
As part of the Docker build process, Cloudera Machine Learning runs a
build script called cdsw-build.sh file. You can
use this file to customize the image environment by specifying any
dependencies to be installed for the code to run successfully. One
advantage to this approach is that you now have the flexibility to use
different tools and libraries in each consecutive training run. Just
modify the build script as per your requirements each time you need to
test a new library or even different versions of a library.
The following sections demonstrate how to specify dependencies in
Python and R projects so that they are included in the build process for
models and experiments.
Python
For Python, create a requirements.txt file in
your project with a list of packages that must be installed. For
example:Figure 1. requirements.txt
beautifulsoup4==4.6.0
seaborn==0.7.1
Then, create a cdsw-build.sh file in your
project and include the following command to install the
dependencies listed in requirements.txt. Figure 2. cdsw-build.sh
pip3 install -r requirements.txt
Now, when cdsw-build.sh is run as part of
the build process, it will install the
beautifulsoup4 and seaborn
packages to the new image built for the experiment/model.
R
For R, create a script called install.R with
the list of packages that must be installed. For example:Figure 3. install.R
Then, create a cdsw-build.sh file in your
project and include the following command to run
install.R.Figure 4. cdsw-build.sh
Rscript install.R
Now, when cdsw-build.sh is run as part of
the build process, it will install the tidyr
and stringr packages to the new image built for
the experiment/model.
If you do not specify a build script, the build process will still run
to completion, but the Docker image will not have any additional
dependencies installed. At the end of the build process, the built image
is then pushed to an internal Docker registry so that it can be made
available to all the Cloudera Machine Learning hosts. This push is
largely transparent to the end user.
We want your opinion
How can we improve this page?
What kind of feedback do you have?
This site uses cookies and related technologies, as described in our privacy policy, for purposes that may include site operation, analytics, enhanced user experience, or advertising. You may choose to consent to our use of these technologies, or