Build Image
Once the code snapshot is available, Cloudera Machine Learning creates a new Docker image with a copy of the snapshot.
The new image is based off the project's designated default engine image (configured at
). The image environment can be customized by using environmental variables and a build script that specifies which packages should be included in the new image.Environmental Variables
Both models and experiments inherit environmental variables from their parent project. Furthermore, in case of models, you can specify environment variables for each model build. In case of conflicts, the variables specified per-build will override any values inherited from the project.
For more information, see Engine Environment Variables.
Build Script - cdsw-build.sh
As part of the Docker build process, Cloudera Machine Learning runs a
build script called cdsw-build.sh
file. You can
use this file to customize the image environment by specifying any
dependencies to be installed for the code to run successfully. One
advantage to this approach is that you now have the flexibility to use
different tools and libraries in each consecutive training run. Just
modify the build script as per your requirements each time you need to
test a new library or even different versions of a library.
- Python
-
For Python, create a
requirements.txt
file in your project with a list of packages that must be installed. For example:Then, create acdsw-build.sh
file in your project and include the following command to install the dependencies listed inrequirements.txt
. Now, whencdsw-build.sh
is run as part of the build process, it will install thebeautifulsoup4
andseaborn
packages to the new image built for the experiment/model. - R
-
For R, create a script called
install.R
with the list of packages that must be installed. For example:Then, create acdsw-build.sh
file in your project and include the following command to runinstall.R
.Now, whencdsw-build.sh
is run as part of the build process, it will install thetidyr
andstringr
packages to the new image built for the experiment/model.
If you do not specify a build script, the build process will still run to completion, but the Docker image will not have any additional dependencies installed. At the end of the build process, the built image is then pushed to an internal Docker registry so that it can be made available to all the Cloudera Machine Learning hosts. This push is largely transparent to the end user.