Launch a Session to Run the Project

Cloudera Data Science Workbench provides an interactive environment tailored for data science called the workbench. It supports R, Python, and Scala engines, one of which we will use to run the template project.

Workbench

Perform the following steps to run the project:

To run the project code, open the workbench and launch a new session.

  1. Open the Workbench to Launch a Session.
    1. Navigate to the new project's Overview page.
    2. Click Open Workbench.
    3. Launch a New Session


      1. Use Select Engine Kernel to choose the programming language that your project uses.
      2. Use Select Engine Profile to select the number of CPU cores and memory to be used.
      3. Click Launch Session.

        The command prompt at the bottom right of your browser window will turn green when the engine is ready. Sessions typically take between 10 and 20 seconds to start.

  2. Execute project code.
    You can enter and execute code using either the editor or the command prompt. The editor is best used for code you want to keep, while the command prompt is best for quick interactive exploration.
    Editor - To run code in the editor:
    1. Select a script from the project files on the left sidebar.
    2. To run the whole script click on the top navigation bar, or, highlight the code you want to run and press Ctrl+Enter (Windows/Linux) or cmd+Enter (macOS).
      Command Prompt - The command prompt functions largely like any other. Enter a command and press Enter to execute it. If you want to enter more than one line of code, use Shift+Enter to move to the next line. The output of your code, including plots, appears in the console.

      Code Autocomplete - The Python and R kernels include support for automatic code completion, both in the editor and the command prompt. Use single tab to display suggestions and double tab for autocomplete.
  3. Test terminal access.
    Cloudera Data Science Workbench provides terminal access to the running engines from the web console. You can use the terminal to move files around, run Git commands, and understand what resources are already available to you in the project environment.
    To access the Terminal from a running session, click Terminal Access above the console pane. The terminal's default working directory is /home/cdsw, which is a temporary directory where all your project files are stored for this session.


  4. Logs.
    The logs tab displays the engine logs for the running session and, if applicable, the Spark logs.
  5. Stop the session.
  6. When you are done with the session, click Stop in the menu bar above the console.