Snapshot Code
When you first launch an experiment or model, Cloudera Data Science Workbench takes a Git snapshot of the project filesystem at that point in time. It is important to note that this Git server functions behind the scenes and is completely separate from any other Git version control system you might be using for the project as a whole.
However, this Git snapshot will recognize the .gitignore
file defined in
the project. This means if there are any artifacts (files, dependencies, etc.) larger than
50 MB stored directly in your project filesystem, make sure to add those files or folders to
.gitignore
so that they are not recorded as part of the snapshot. This
ensures that the experiment or model environment is truly isolated and does not inherit
dependencies that have been previously installed in the project workspace.
By default, each project is created with the following
.gitignore
file:
R
node_modules
*.pyc
.*
!.gitignore
Augment this file to include any extra dependencies you have installed in your project workspace to ensure a truly isolated workspace for each model or experiment.
Multiple .gitignore
files
A project can include multiple .gitignore
files. However, there can only be one .git
directory in the project, located at the
project root, /home/cdsw/.git
, otherwise
Experiment and Model deployment fails.
If you create a blank project, and then want to clone a repo into it,
clone a single project to the root of the workspace (/home/cdsw/.git
) to ensure that Experiments and Models work.