Installing Packages and Libraries

Cloudera Data Science Workbench engines are preloaded with a few common packages and libraries for R, Python, and Scala. However, a key feature of Cloudera Data Science Workbench is the ability of different projects to install and use libraries pinned to specific versions, just as you would on your local computer..

You can install additional libraries and packages from the workbench, either using the command prompt or terminal. You only need to install libraries and packages once per project. From then on, they are available to any new engine you spawn throughout the lifetime of the project.

To install a package:

  1. Launch a session in your favorite language.
  2. At the command prompt in the bottom right, enter the command to install:

R

# Install from CRAN 
install.packages("ggplot2") 

# Install using devtools 
install.packages('devtools') 
library(devtools) 
install_github("hadley/ggplot2") 

Python 2

# Installing from console using ! shell operator and pip:
!pip install beautifulsoup

# Installing from terminal
pip install beautifulsoup

Python 3

# Installing from console using ! shell operator and pip3:
!pip3 install beautifulsoup

# Installing from terminal
pip3 install beautifulsoup

Cloudera Data Science Workbench does not currently support customization of system packages that require root access. However, Cloudera Data Science Workbench site administrators and project administrators can add libraries and other dependencies to the Docker image in which their engines run. See Customizing Engine Images.