Using the Workbench Console in Cloudera Data Science Workbench

The workbench console provides an interactive environment tailored for data science, supporting R, Python and Scala. It currently supports R, Python, and Scala engines. You can use these engines in isolation, as you would on your laptop, or connect to your CDH cluster using Cloudera Distribution of Apache Spark 2 and other libraries.

The workbench includes four primary components:

  • An editor where you can edit your scripts.
  • A console where you can track the results of your analysis.
  • A command prompt where you can enter commands interactively.
  • A terminal where you can use a Bash shell.


Typically, you would use the following steps to run a project in the workbench:

  1. Launch a Session
  2. Execute Code
  3. Access the Terminal
  4. Stop a Session

Launch a Session

To launch a session:

  1. Navigate to a project's Overview page.
  2. Click Open Workbench.
  3. Use Select Engine Kernel to choose your language.
  4. Use Select Engine Profile to select the number of CPU cores and memory.
  5. Click Launch Session.

The command prompt at the bottom right of your browser window turns green when the engine is ready. Sessions typically take between 10 and 20 seconds to start.

Execute Code

You can enter and execute code at the command prompt or the editor. The editor is best for code you want to keep, while the command prompt is best for quick interactive exploration.

If you want to enter more than one line of code at the command prompt, use Shift-Enter to move to the next line. Press Enter to run your code. The output of your code, including plots, appears in the console.

If you created your project from a template, there are code files in the editor. You can open a file in the editor by double-clicking the file name in the file list.

To run code in the editor:

  1. Select a code file in the list on the left.
  2. Highlight the code you want to run.
  3. Press Ctrl-Enter (Windows/Linux) or Command-Enter (OSX).

When doing real analysis, writing and executing your code from the editor rather than the command prompt makes it easy to iteratively develop your code and save it along the way.

If you require more space for your editor, you can collapse the file list by double-clicking between the file list pane and the editor pane. You can hide the editor using editor's View menu.

Code Autocomplete - The Python and R kernels include support for automatic code completion, both in the editor and the command prompt. Use single tab to display suggestions and double tab for autocomplete.

Access the Terminal

Cloudera Data Science Workbench provides full terminal access to running engines from the web console. If you run klist you should see your authenticated Kerberos principal. If you run hdfs dfs -ls you will see the files stored in your HDFS home directory. You do not need to worry about Kerberos authentication.

Use the terminal to move files around, run Git commands, access the YARN and Hadoop CLIs, or install libraries that cannot be installed directly from the engine. You can access the Terminal from a running Session page by clicking Terminal Access above the session log pane.

All of your project files are in /home/cdsw. Any modifications you make to this folder will persist across runs, while modifications to other folders are discarded.

By default, the terminal does not provide root or sudo access to the container. To install packages that require root access, see Customizing Engine Images.

Stop a Session

When you are done with the session, click Stop in the menu bar above the console, or use code to exit by typing the following command:

R

quit()

Python

exit

Scala

quit()

Sessions automatically stop after an hour of inactivity.