Managing Projects in Cloudera Data Science Workbench

Projects form the heart of Cloudera Data Science Workbench. They hold all the code, configuration, and libraries needed to reproducibly run analyses. Each project is independent, ensuring users can work freely without interfering with one another or breaking existing workloads.

Creating a Project

To create a Cloudera Data Science Workbench project:
  1. Go to Cloudera Data Science Workbench and on the left sidebar, click Projects.
  2. Click New Project.
  3. If you are a member of a team, from the drop-down menu, select the Account under which you want to create this project. If there is only one account on the deployment, you will not see this option.
  4. Enter a Project Name.
  5. Select Project Visibility from one of the following options.
    • Private - Only project collaborators can view or edit the project.
    • Team - If the project is created under a team account, all members of the team can view the project. Only explicitly-added collaborators can edit the project.
    • Public - All authenticated users of Cloudera Data Science Workbench will be able to view the project. Collaborators will be able to edit the project.
  6. Under Initial Setup, you can either create a blank project, or select one of the following sources for your project files.
    • Template - Template projects contain example code that can help you get started with the Cloudera Data Science Workbench. They are available in R, Python, PySpark, and Scala. Using a template project is not required, but it does give you the impetus to start using the Cloudera Data Science Workbench right away.
    • Local - If you have an existing project on your local disk, use this option to upload compressed file or folder to Cloudera Data Science Workbench.
    • Git - If you already use Git for version control and collaboration, you can continue to do so with the Cloudera Data Science Workbench. Specifying a Git URL will clone the project into Cloudera Data Science Workbench. If you use a Git SSH URL, your personal private SSH key will be used to clone the repository. This is the recommended approach. However, you must add the public SSH key from your personal Cloudera Data Science Workbench account to the remote Git hosting service before you can clone the project.
  7. Click Create Project. After the project is created, you can see your project files and the list of jobs defined in your project.
  8. (Optional) To work with team members on a project, use the instructions in the following section to add them as collaborators to the project.

Adding Project Collaborators

If you want to work closely with colleagues on a particular project, use the following steps to add them to the project.
  1. Navigate to the project overview page.
  2. Click Team to open the Collaborators page.
  3. Search for collaborators by either name or email address and click Add.

    For a project created under your personal account, anyone who belongs to your organization can be added as a collaborator. For a project created under a team account, you can only add collaborators that already belong to the team. If you want to work on a project that requires collaborators from different teams, create a new team with the required members, then create a project under that account. If your project was created from a Git repository, each collaborator will have to create the project from the same central Git repository.

    You can grant collaborators one of three levels of access:
    • Viewer - Read-only access to code, data, and results.
    • Contributor: Can view, edit, create, and delete files and environmental variables, run jobs and execute code in running jobs.
    • Admin: This user has complete access to all aspects of the project, including adding new collaborators, and deleting the entire project.

For more information on collaborating effectively, see Sharing Projects and Analysis Results.

Modifying Project Settings

Project contributors and administrators can modify aspects of the project environment such as the engine being used to launch sessions, the environment variables, and create SSH tunnels to access external resources. To make these changes:
  1. Switch context to the account where the project was created.
  2. Click Projects.
  3. From the list of projects, select the one you want to modify.
  4. Click Settings to open up the Project Settings dashboard.
    Modify the project name and its privacy settings on this page.
    Cloudera Data Science Workbench ensures that your code is always run with the specific engine version you selected. You can select the version here. For advanced use cases, Cloudera Data Science Workbench projects can use custom Docker images for their projects. Site administrators can whitelist images for use in projects, and project administrators can use this page to select which of these whitelisted images is installed for their projects. For an example, see Customizing Engine Images.

    Environment - If there are any environmental variables that should be injected into all the engines running this project, you can add them to this page. For more details, see Project Environment Variables.

    In some environments, external databases and data sources reside behind restrictive firewalls. Cloudera Data Science Workbench provides a convenient way to connect to such resources using your SSH key. For instructions, see SSH Tunnels.
    This page lists a webhook that can be added to your Git configuration to ensure that your project files are updated with the latest changes from the remote repository.
    Delete Project
    This page can only be accessed by project administrators. Remember that deleting a project is irreversible. All files, data, sessions, and jobs will be lost.

Managing Project Files

Cloudera Data Science Workbench allows you to move, rename, copy, and delete files within the scope of the project where they live. You can also upload new files to a project, or download project files. Files can only be uploaded within the scope of a single project. Therefore, to access a script or data file from multiple projects, you will need to manually upload it to all the relevant projects.

  1. Switch context to the account where the project was created.
  2. Click Projects.
  3. From the list of projects, click on the project you want to modify. This will take you to the project overview.
  4. Click Files.
    Upload Files to a Project

    Click Upload. Select Files or Folder from the dropdown, and choose the files or folder you want to upload from your local filesystem.

    Download Project Files

    Click Download to download the entire project in a .zip file. To download only a specific file, select the checkbox next to the file(s) to be download and click Download.

    You can also use the checkboxes to Move, Rename, or Delete files within the scope of this project.