Snapshot Code
When you first launch an experiment or model, Cloudera Machine Learning takes a Git snapshot of the project filesystem at that point in time. This Git server functions behind the scenes and is completely separate from any other Git version control system you might be using for the project as a whole.
However, this Git snapshot will recognize the .gitignore
file defined in the project. This means if there are any artifacts (files,
dependencies, etc.) larger than 50 MB stored directly in your project
filesystem, make sure to add those files or folders to
.gitignore
so that they are not recorded as part of the
snapshot. This ensures that the experiment/model environment is truly
isolated and does not inherit dependencies that have been previously
installed in the project workspace.
By default, each project is created with the following
.gitignore
file:
R node_modules *.pyc .* !.gitignore
Augment this file to include any extra dependencies you have installed in your project workspace to ensure a truly isolated workspace for each model/experiment.