Configure a Browser IDE as an Editor

When you use a browser IDE, changes that you make in the editor are propagated to the Cloudera Data Science Workbench project. For example, if you create a new .py file or modify an existing one with the third-party editor, the changes are propagated to Cloudera Data Science Workbench. When you run the code from the notebook, execution is pushed from the notebook to Cloudera Data Science Workbench.

Base Engine Image v8 (and later) comes preconfigured with Jupyter. Jupyter can be selected in place of the built-in Workbench editor when you launch a session, and no additional configuration is required.

You can configure additional IDEs to be available from the dropdown. You have two configuration options:

  • Project Level: You can configure an editor at the project level so that any session launched within that project can use the editor configured. Other projects across the deployment will not be able to use any editors configured in such a manner. For steps, see Configure a Browser IDE at the Project Level.
  • Engine Level: You can create a custom engine configured with the editor so that any project across the deployment that uses this custom engine can also use the editor configured. This might be the only option in case of certain browser IDEs (such as RStudio) that require root permission to install and therefore cannot be directly installed within the project. For steps, see Configure a Browser IDE at the Engine Level.

Cloudera recommends you first test the browser IDE you intend to install in a session before you install it to the project or build a custom engine with it. For steps, see Test a Browser IDE in a Session Before Installation.

Test a Browser IDE in a Session Before Installation

This process can be used to ensure that a browser IDE works as expected before you install it to a project or to a customized engine image. This process is not meant for browser IDEs that require root permission to install, such as RStudio.

These steps are only required if you want to use an editor that does not come pre-installed as part of the default engine image. Perform the following steps to configure an editor for your session:

  1. Ensure that your browser accepts pop-up windows and cookies from Cloudera Data Science Workbench web UI.
  2. Open the Cloudera Data Science Workbench web UI.
  3. Go to your project and launch a session with the kernel of your choice and the Workbench editor. Alternatively, open an existing session.
  4. In the interactive command prompt or terminal for the session, install the editor you want to use. See the documentation for your editor for specific instructions.
    For example:
    Jupyter Lab
    Python 2
    The following example command installs Jupyter Lab for Python 2:
    !pip install jupyterlab
    Python 3
    The following example command installs Jupyter Lab for Python 3:
    !pip3 install jupyterlab
  5. After the installation completes, enter the command to start the server for the notebook on the port specified in the CDSW_APP_PORT environment variable on IP address 127.0.0.1.
    For example, the following command starts the server for Jupyter Lab on the port specified in the CDSW_APP_PORT environment variable:
    !/home/cdsw/.local/bin/jupyter-lab --no-browser --ip=127.0.0.1 --port=${CDSW_APP_PORT} --NotebookApp.token= --NotebookApp.allow_remote_access=True --log-level=ERROR
    
  6. Click on the grid icon in the top right.
    You should see the editor in the drop-down menu. If you select the editor, it opens in a new browser tab.

Configure a Browser IDE at the Project Level

Perform the following steps to configure an editor at the project level:

  1. (Recommended) Test a Browser IDE in a Session Before Installation
  2. Install the IDE of your choice to the project. For information about how to install additional packages to a project, see Installing Additional Packages.
  3. Open the Cloudera Data Science Workbench web UI.
  4. Go to the project you want to configure an editor for.
  5. Go to Settings > Editors and click New Editor.
  6. Complete the fields:
    • Name: Provide a name for the editor. This is the name that appears in the dropdown menu for Editors when you start a new session.
    • Command: Enter the command to start the server for the editor on the Cloudera Data Science Workbench public port specified in the CDSW_APP_PORT environment variable (default 8081).

      For example, the following command starts Jupyter Lab on the port specified by the CDSW_APP_PORT environment variable:

      /home/cdsw/.local/bin/jupyter-lab --no-browser --ip=127.0.0.1 --port=${CDSW_APP_PORT} --NotebookApp.token= --NotebookApp.allow_remote_access=True --log-level=ERROR
      
      This is the same command you used to start the IDE to test it in a session.
  7. Save the changes.
    When a user starts a new session, the editor you added is available in the list of editors. Browsers must be configured to accept cookies and allow pop-up windows from the Cloudera Data Science Workbench web UI.

Configure a Browser IDE at the Engine Level

You can make a browser IDE available to any project within a Cloudera Data Science Workbench deployment by creating a customized engine image, installing the editor to it, and then whitelisting the custom image for projects as needed. Additionally, browser IDEs that require root permission to install, such as RStudio, can only be used as part of a customized engine image.

When a user launches a session, they can select the customized engine with the editors available. The following steps describe how to build a customized engine image for RStudio:

  1. Create a Dockerfile for the new custom image. Note that the base engine image uses Ubuntu.
    The following sample Dockerfile is for RStudio:
    FROM docker.repository.cloudera.com/cdsw/engine:10
    
    WORKDIR /tmp
    
    #Delete the Cloudera repository that is inaccessible because of the paywall
    
    RUN rm /etc/apt/sources.list.d/*
    
    #The RUN commands that install an editor
    #For example: RUN apt-get install myeditor
    
    RUN apt-get update && \
      apt-get install -y --no-install-recommends \
        libapparmor1 \
        libclang-dev \
        lsb-release \
        psmisc \
        sudo && \
      apt-get clean && \
      apt-get autoremove && \
      rm -rf /var/lib/apt/lists/*
    
    RUN wget --quiet https://download2.rstudio.org/server/bionic/amd64/rstudio-server-1.2.5033-amd64.deb && \
        dpkg -i rstudio-server-1.2.5033-amd64.deb && \
        rm rstudio-server-1.2.5033-amd64.deb
    
    COPY rserver.conf /etc/rstudio/rserver.conf
    
    COPY rstudio-cdsw /usr/local/bin/rstudio-cdsw
    
    RUN chmod +x /usr/local/bin/rstudio-cdsw
  2. Create rserver.conf:
    # Must match CDSW_APP_PORT
    www-port=8090
    server-app-armor-enabled=0
    server-daemonize=0
    www-address=127.0.0.1
    auth-none=1
    auth-validate-users=0
    Make sure that the www-port property matches the port set in the CDSW_APP_PORT environment variable (default 8090).
  3. Create rstudio-cdsw:
    #!/bin/bash
    
    # This saves RStudio's user runtime information to /tmp, which ensures several
    # RStudio sessions can run in the same project simultaneously
    mkdir -p /tmp/rstudio/sessions/active
    mkdir -p /home/cdsw/.rstudio/sessions
    if [ -d /home/cdsw/.rstudio/sessions/active ]; then rm -rf /home/cdsw/.rstudio/sessions/active; fi
    ln -s /tmp/rstudio/sessions/active /home/cdsw/.rstudio/sessions/active
    
    # This ensures RStudio picks up the environment. This may not be necessary if
    # you are installing RStudio Professional. See
    # https://docs.rstudio.com/ide/server-pro/r-sessions.html#customizing-session-launches.
    # SPARK_DIST_CLASSPATH is treated as a special case to workaround a bug in R
    # with very long environment variables.
    env | grep -v ^SPARK_DIST_CLASSPATH >> /usr/local/lib/R/etc/Renviron.site
    echo "Sys.setenv(\"SPARK_DIST_CLASSPATH\"=\"${SPARK_DIST_CLASSPATH}\")" >> /usr/local/lib/R/etc/Rprofile.site
    
    # Now start RStudio
    /usr/sbin/rstudio-server start
  4. Build the Dockerfile:
    docker build -t <image-name>:<tag> . -f Dockerfile
    If you want to build your image on a Cloudera Data Science Workbench gateway host, you must add the --network=host option to the build command:
    docker build --network=host -t <image-name>:<tag> . -f Dockerfile
  5. Distribute the image:
    • Push the image to a public registry such as DockerHub.

      For instructions, refer the Docker documentation: docker push.

    • Push the image to your company's Docker registry.

      When using this method, make sure to tag your image with the following schema:

      docker tag <image-name> <company-registry>/<user-name>/<image-name>:<tag>

      Once the image has been tagged properly, use the following command to push the image:

      docker push <company-registry>/<user-name>/<image-name>:<tag>
    • Distribute the image manually:
      1. Save the docker image as a tarball on the host where it was built
        docker image save -o ./<new_customized_engine>.tar <image-name>
      2. Distribute the image to all the Cloudera Data Science Workbench gateway hosts.
        scp ./<new_customized_engine>.tar root@<cdsw.your_company.com>:/tmp/
      3. Load the image on all the Cloudera Data Science Workbench gateway hosts.
        docker load --input /tmp/./<new_customized_engine>.tar
      4. To verify that the image was successfully distributed and loaded, run:
        docker images
  6. Whitelist the image in Cloudera Data Science Workbench:
    1. Log in to the Cloudera Data Science Workbench web UI as a site administrator.
    2. Click Admin > Engines.
    3. Add <company-registry>/<user-name>/<image-name>:<tag> to the list of whitelisted engine images.
  7. Whitelist the new engine for a project:
    1. Go to the project Settings page.
    2. Click Engines.
    3. Select the new engine from the dropdown list of available Docker images. This engine will now be used to launch sessions within this project.
  8. Configure project(s) to use RStudio. When this is done, you will be able to select RStudio from the dropdown list of editors on the Launch New Session page. There are two ways to do this: for an individual project, or for all projects that use this engine.

    Configure RStudio for an individual project

    1. Go to the project Settings > Editors.
    2. Click New Editor.
    3. Complete the fields:
      • Name: Provide a name for the editor. For example, RStudio. This is the name that appears in the dropdown menu for Editors when you start a new session.
      • Command: Enter the command to start the server for the editor.

        For example, the following command will start RStudio:

        /usr/local/bin/rstudio-cdsw
    4. Click Save.

    Configure RStudio for all projects that use this engine

    1. Log in to the Cloudera Data Science Workbench web UI as a site administrator.
    2. Click Admin > Engines.
    3. Under Engine Images, click the Edit button for the engine image that you whitelisted here in a previous step.
    4. Click New Editor.
      • Name: Provide a name for the editor. For example, RStudio. This is the name that appears in the dropdown menu for Editors when you start a new session.
      • Command: Enter the command to start the server for the editor.

        For example, the following command will start RStudio:

        /usr/local/bin/rstudio-cdsw
    5. Click Save, then click Save again.

For more information about how to create a customized engine image and limitations, see Customized Engine Images