Collaborating Effectively with Git
Cloudera Data Science Workbench provides seamless access to Git projects. Whether you are working independently, or as part of a team, you can leverage all of benefits of version control and collaboration with Git from within Cloudera Data Science Workbench. Teams that already use Git for collaboration can continue to do so. Each team member will need to create a separate Cloudera Data Science Workbench project from the central Git repository.
For anything but simple projects, Cloudera recommends using Git to version control your projects. You should work on Cloudera Data Science Workbench the same way you would work locally, and for most data scientists and developers that means using Git.
Cloudera Data Science Workbench does not include significant UI support for Git, but instead allows you to use the full power of the command line. If you run an engine and open a terminal, you can run any Git command, including init, add, commit, branch, merge and rebase. Everything should work exactly as it does locally, except that you are running on a distributed edge host directly connected to your Apache Hadoop cluster.
Importing a Project From Git
When you create a project, you can optionally supply an HTTPS or SSH Git URL that points to a remote repository. The new project is a clone of that remote repository. You can commit, push and pull your code by running a console and opening a terminal.
If you want to use SSH to clone the repo, add your personal SSH key to your Git server. For instructions, see Adding SSH Key to GitHub.
Linking an Existing Project to a Git Remote
If you did not create your project from a Git repository, you can link an existing project to a Git remote (for example, git@github.com:username/repo.git) so that you can push and pull your code.
To link to a Git remote:
- Launch a new session.
- Open a terminal.
- Enter the following commands:
Shell
git init git add * git commit -a -m 'Initial commit' git remote add origin git@github.com:username/repo.git
You can run git status after git init to make sure your .gitignore includes a folder for libraries and other non-code artifacts.