Cloudera Data Science Workbench User Guide
The Cloudera Data Science Workbench web application is typically hosted on the master node, at http://cdsw.<your_domain>.com. You can use the web application to create and run data science workloads on the associated CDH cluster, either individually or in teams.
Home Page
The web application's home page displays a list of your projects, a dashboard to track running sessions/jobs, and memory and CPU usage. From this page you can create new projects, create new teams, manage your account settings, access the CDH cluster, and more.
Demo - Watch this video for a quick walk-through of the Cloudera Data Science Workbench UI: CDSW Quickstart
User Contexts
Cloudera Data Science Workbench uses the notion of contexts to separate your personal account from any team accounts you belong to. Depending on the context you are in, you will be able to modify settings for either your personal account, or a team account, and see the projects created in each account. Shared personal projects will show up in your personal account context. Context changes in the UI are subtle, so if you're wondering where a project or setting lives, first make sure you are in the right context.
The application header will tell you which context you are currently in. You can switch to a different context by going to the drop-down menu in the upper right-hand corner of the page.
The rest of this topic features instructions for some common tasks a Cloudera Data Science Workbench user can be expected to perform.
Managing your Personal Account
To manage your personal account settings:
- Sign in to Cloudera Data Science Workbench.
- From the upper right drop-down menu, switch context to your personal account.
- Click Settings.
- Profile
- You can modify your name, email, and bio on this page.
- Teams
- This page lists the teams you are a part of and the role assigned to you for each team.
- SSH Keys
- Your public SSH key resides here. SSH keys provide a useful way to access to external resources such as databases or remote Git repositories. For instructions, see SSH Keys.
- Hadoop Authentication
- Enter your Hadoop credentials here to authenticate yourself against the cluster KDC. For more information, see Hadoop Authentication with Kerberos for Cloudera Data Science Workbench.
Managing Team Accounts
Users who work together on more than one project and want to facilitate collaboration can create a Team. Teams allow streamlined administration of projects. Team projects are owned by the team, rather than an individual user. Team administrators can add or remove members at any time, assigning each member different permissions.
Creating a Team
To create a team:
- Click the plus sign (+) in the title bar, to the right of the Search field.
- Select Create Team.
- Enter a Team Name.
- Click Create Team.
- Add or invite team members. Team members can have one of the following privilege levels:
- Viewer - Cannot create new projects within the team but can be added to existing ones
- Contributor - Can create new projects within the team. They can also be added to existing team projects.
- Admin - Has complete access to all team projects, and account and billing information.
- Click Done.
Modifying Team Account Settings
- From the upper right drop-down menu, switch context to the team account.
- Click Settings to open up the Account Settings dashboard.
- Profile
- Modify the team description on this page.
- Members
- You can add new team members on this page, and modify privilege levels for existing members.
- SSH Keys
- The team's public SSH key resides here. Team SSH keys provide a useful way to give an entire team access to external resources such as databases. For instructions, see SSH Keys. Generally, team SSH keys should not be used to authenticate against Git repositories. Use your personal key instead.
Next Steps
- Create a project in Cloudera Data Science Workbench. You can either create a new blank project or import an existing project. For instructions, see Managing Projects in Cloudera Data Science Workbench.
- Open the workbench and launch an engine to run your project. For help, see Using the Workbench Console in Cloudera Data Science Workbench
- (Optional) For more mature data science projects that need to run on a recurring schedule, Cloudera Data Science Workbench allows you to create jobs and pipelines. For more details on what you can accomplish by scheduling jobs and pipelines, see Managing Jobs and Pipelines in Cloudera Data Science Workbench