Configure a Browser IDE at the Legacy Engine Level
You can make a browser IDE available to any project within a Cloudera Data Science Workbench deployment by creating a customized legacy engine image, installing the editor to it, and then whitelisting the custom image for projects as needed. Additionally, browser IDEs that require root permission to install, such as RStudio, can only be used as part of a customized legacy engine image.
When a user launches a session, they can select the customized legacy engine with the editors available. The following steps describe how to build a customized legacy engine image for RStudio:
-
Create a Dockerfile for the new custom image. Note that the base engine image uses
Ubuntu.
The following sample Dockerfile is for RStudio:
FROM docker.repository.cloudera.com/cdsw/engine:10 WORKDIR /tmp #Delete the Cloudera repository that is inaccessible because of the paywall RUN rm /etc/apt/sources.list.d/* #The RUN commands that install an editor #For example: RUN apt-get install myeditor RUN apt-get update && \ apt-get install -y --no-install-recommends \ libapparmor1 \ libclang-dev \ lsb-release \ psmisc \ sudo && \ apt-get clean && \ apt-get autoremove && \ rm -rf /var/lib/apt/lists/* RUN wget --quiet https://download2.rstudio.org/server/bionic/amd64/rstudio-server-1.2.5033-amd64.deb && \ dpkg -i rstudio-server-1.2.5033-amd64.deb && \ rm rstudio-server-1.2.5033-amd64.deb COPY rserver.conf /etc/rstudio/rserver.conf COPY rstudio-cdsw /usr/local/bin/rstudio-cdsw RUN chmod 777 /usr/local/bin/rstudio-cdsw
-
Create rserver.conf:
Make sure that the# Must match CDSW_APP_PORT www-port=8090 server-app-armor-enabled=0 server-daemonize=0 www-address=127.0.0.1 auth-none=1 auth-validate-users=0
www-port
property matches the port set in theCDSW_APP_PORT
environment variable (default 8090). -
Create rstudio-cdsw:
#!/bin/bash # This saves RStudio's user runtime information to /tmp, which ensures several # RStudio sessions can run in the same project simultaneously mkdir -p /tmp/rstudio/sessions/active mkdir -p /home/cdsw/.rstudio/sessions if [ -d /home/cdsw/.rstudio/sessions/active ]; then rm -rf /home/cdsw/.rstudio/sessions/active; fi ln -s /tmp/rstudio/sessions/active /home/cdsw/.rstudio/sessions/active # This ensures RStudio picks up the environment. This may not be necessary if # you are installing RStudio Professional. See # https://docs.rstudio.com/ide/server-pro/r-sessions.html#customizing-session-launches. # SPARK_DIST_CLASSPATH is treated as a special case to workaround a bug in R # with very long environment variables. env | grep -v ^SPARK_DIST_CLASSPATH >> /usr/local/lib/R/etc/Renviron.site echo "Sys.setenv(\"SPARK_DIST_CLASSPATH\"=\"${SPARK_DIST_CLASSPATH}\")" >> /usr/local/lib/R/etc/Rprofile.site # Now start RStudio /usr/sbin/rstudio-server start
-
Build the Dockerfile:
docker build -t <image-name>:<tag> . -f Dockerfile
If you want to build your image on a Cloudera Data Science Workbench gateway host, you must add the--network=host
option to the build command:docker build --network=host -t <image-name>:<tag> . -f Dockerfile
-
Distribute the image:
- Push the image to a public registry such as DockerHub.
For instructions, refer the Docker documentation: docker push.
- Push the image to your company's Docker registry.
When using this method, make sure to tag your image with the following schema:
docker tag <image-name> <company-registry>/<user-name>/<image-name>:<tag>
Once the image has been tagged properly, use the following command to push the image:
docker push <company-registry>/<user-name>/<image-name>:<tag>
- Distribute the image manually:
- Save the docker image as a tarball on the host where it was
built
docker image save -o ./<new_customized_engine>.tar <image-name>
- Distribute the image to all the Cloudera Data Science Workbench gateway
hosts.
scp ./<new_customized_engine>.tar root@<cdsw.your_company.com>:/tmp/
- Load the image on all the Cloudera Data Science Workbench gateway
hosts.
docker load --input /tmp/./<new_customized_engine>.tar
- To verify that the image was successfully distributed and loaded,
run:
docker images
- Save the docker image as a tarball on the host where it was
built
- Push the image to a public registry such as DockerHub.
-
Whitelist the image in Cloudera Data Science Workbench:
- Log in to the Cloudera Data Science Workbench web UI as a site administrator.
- Click Admin > Engines.
-
Add
<company-registry>/<user-name>/<image-name>:<tag>
to the list of whitelisted engine images.
-
Add th enew legacy engine to the trusted list for a project:
- Go to the project Settings page.
- Click Engines.
- Select the new legacy engine from the dropdown list of available Docker images. This engine will now be used to launch sessions within this project.
-
Configure project(s) to use RStudio. When this is done, you will be able to select
RStudio from the dropdown list of editors on the Launch New Session
page. There are two ways to do this: for an individual project, or for all projects that
use this engine.
Configure RStudio for an individual project
- Go to the project Settings > Editors.
- Click New Editor.
-
Complete the fields:
- Name: Provide a name for the editor. For example, RStudio. This is the name that appears in the dropdown menu for Editors when you start a new session.
- Command: Enter the command to start the server for the
editor.
For example, the following command will start RStudio:
/usr/local/bin/rstudio-cdsw
- Click Save.
Configure RStudio for all projects that use this engine
- Log in to the Cloudera Data Science Workbench web UI as a site administrator.
- Click Admin > Engines.
- Under Engine Images, click the Edit button for the engine image that you whitelisted here in a previous step.
-
Click New Editor.
- Name: Provide a name for the editor. For example, RStudio. This is the name that appears in the dropdown menu for Editors when you start a new session.
- Command: Enter the command to start the server for the
editor.
For example, the following command will start RStudio:
/usr/local/bin/rstudio-cdsw
- Click Save, then click Save again.
For more information about how to create a customized engine image and limitations, see AWS Account Requirements