Configure a Browser IDE at the Engine Level

You can make a browser IDE available to any project within a Cloudera Data Science Workbench deployment by creating a customized engine image, installing the editor to it, and then whitelisting the custom image for projects as needed. Additionally, browser IDEs that require root permission to install, such as RStudio, can only be used as part of a customized engine image.

When a user launches a session, they can select the customized engine with the editors available. The following steps describe how to build a customized engine image for RStudio:

  1. Create a Dockerfile for the new custom image. Note that the base engine image uses Ubuntu.
    The following sample Dockerfile is for RStudio:
    FROM docker.repository.cloudera.com/cdsw/engine:10
    
    WORKDIR /tmp
    
    #Delete the Cloudera repository that is inaccessible because of the paywall
    
    RUN rm /etc/apt/sources.list.d/*
    
    #The RUN commands that install an editor
    #For example: RUN apt-get install myeditor
    
    RUN apt-get update && \
      apt-get install -y --no-install-recommends \
        libapparmor1 \
        libclang-dev \
        lsb-release \
        psmisc \
        sudo && \
      apt-get clean && \
      apt-get autoremove && \
      rm -rf /var/lib/apt/lists/*
    
    RUN wget --quiet https://download2.rstudio.org/server/bionic/amd64/rstudio-server-1.2.5033-amd64.deb && \
        dpkg -i rstudio-server-1.2.5033-amd64.deb && \
        rm rstudio-server-1.2.5033-amd64.deb
    
    COPY rserver.conf /etc/rstudio/rserver.conf
    
    COPY rstudio-cdsw /usr/local/bin/rstudio-cdsw
    
    RUN chmod +x /usr/local/bin/rstudio-cdsw
  2. Create rserver.conf:
    # Must match CDSW_APP_PORT
    www-port=8090
    server-app-armor-enabled=0
    server-daemonize=0
    www-address=127.0.0.1
    auth-none=1
    auth-validate-users=0
    Make sure that the www-port property matches the port set in the CDSW_APP_PORT environment variable (default 8090).
  3. Create rstudio-cdsw:
    #!/bin/bash
    
    # This saves RStudio's user runtime information to /tmp, which ensures several
    # RStudio sessions can run in the same project simultaneously
    mkdir -p /tmp/rstudio/sessions/active
    mkdir -p /home/cdsw/.rstudio/sessions
    if [ -d /home/cdsw/.rstudio/sessions/active ]; then rm -rf /home/cdsw/.rstudio/sessions/active; fi
    ln -s /tmp/rstudio/sessions/active /home/cdsw/.rstudio/sessions/active
    
    # This ensures RStudio picks up the environment. This may not be necessary if
    # you are installing RStudio Professional. See
    # https://docs.rstudio.com/ide/server-pro/r-sessions.html#customizing-session-launches.
    # SPARK_DIST_CLASSPATH is treated as a special case to workaround a bug in R
    # with very long environment variables.
    env | grep -v ^SPARK_DIST_CLASSPATH >> /usr/local/lib/R/etc/Renviron.site
    echo "Sys.setenv(\"SPARK_DIST_CLASSPATH\"=\"${SPARK_DIST_CLASSPATH}\")" >> /usr/local/lib/R/etc/Rprofile.site
    
    # Now start RStudio
    /usr/sbin/rstudio-server start
  4. Build the Dockerfile:
    docker build -t <image-name>:<tag> . -f Dockerfile
    If you want to build your image on a Cloudera Data Science Workbench gateway host, you must add the --network=host option to the build command:
    docker build --network=host -t <image-name>:<tag> . -f Dockerfile
  5. Distribute the image:
    • Push the image to a public registry such as DockerHub.

      For instructions, refer the Docker documentation: docker push.

    • Push the image to your company's Docker registry.

      When using this method, make sure to tag your image with the following schema:

      docker tag <image-name> <company-registry>/<user-name>/<image-name>:<tag>

      Once the image has been tagged properly, use the following command to push the image:

      docker push <company-registry>/<user-name>/<image-name>:<tag>
    • Distribute the image manually:
      1. Save the docker image as a tarball on the host where it was built
        docker image save -o ./<new_customized_engine>.tar <image-name>
      2. Distribute the image to all the Cloudera Data Science Workbench gateway hosts.
        scp ./<new_customized_engine>.tar root@<cdsw.your_company.com>:/tmp/
      3. Load the image on all the Cloudera Data Science Workbench gateway hosts.
        docker load --input /tmp/./<new_customized_engine>.tar
      4. To verify that the image was successfully distributed and loaded, run:
        docker images
  6. Whitelist the image in Cloudera Data Science Workbench:
    1. Log in to the Cloudera Data Science Workbench web UI as a site administrator.
    2. Click Admin > Engines.
    3. Add <company-registry>/<user-name>/<image-name>:<tag> to the list of whitelisted engine images.
  7. Whitelist the new engine for a project:
    1. Go to the project Settings page.
    2. Click Engines.
    3. Select the new engine from the dropdown list of available Docker images. This engine will now be used to launch sessions within this project.
  8. Configure project(s) to use RStudio. When this is done, you will be able to select RStudio from the dropdown list of editors on the Launch New Session page. There are two ways to do this: for an individual project, or for all projects that use this engine.

    Configure RStudio for an individual project

    1. Go to the project Settings > Editors.
    2. Click New Editor.
    3. Complete the fields:
      • Name: Provide a name for the editor. For example, RStudio. This is the name that appears in the dropdown menu for Editors when you start a new session.
      • Command: Enter the command to start the server for the editor.

        For example, the following command will start RStudio:

        /usr/local/bin/rstudio-cdsw
    4. Click Save.

    Configure RStudio for all projects that use this engine

    1. Log in to the Cloudera Data Science Workbench web UI as a site administrator.
    2. Click Admin > Engines.
    3. Under Engine Images, click the Edit button for the engine image that you whitelisted here in a previous step.
    4. Click New Editor.
      • Name: Provide a name for the editor. For example, RStudio. This is the name that appears in the dropdown menu for Editors when you start a new session.
      • Command: Enter the command to start the server for the editor.

        For example, the following command will start RStudio:

        /usr/local/bin/rstudio-cdsw
    5. Click Save, then click Save again.

For more information about how to create a customized engine image and limitations, see AWS Account Requirements