Installing Packages and Libraries

Cloudera Data Science Workbench engines are preloaded with a few common packages and libraries for R, Python, and Scala. However, a key feature of Cloudera Data Science Workbench is the ability of different projects to install and use libraries pinned to specific versions, just as you would on your local computer..

You can install additional libraries and packages from the workbench, either using the command prompt or terminal. To install a package:

Launch a session in your favorite language.
At the command prompt in the bottom right, enter the command to install:

# Install from CRAN 
install.packages("ggplot2") 

# Install using devtools 
install.packages('devtools') 
library(devtools) 
install_github("hadley/ggplot2")

Python 2

# Installing from console using ! shell operator and pip:
!pip install beautifulsoup

# Installing from terminal
pip install beautifulsoup

Python 3

# Installing from console using ! shell operator and pip3:
!pip3 install beautifulsoup

# Installing from terminal
pip3 install beautifulsoup

Generally, Cloudera recommends you install all packages locally into your project. This will ensure you have the exact versions you want and that these libraries will not be upgraded when Cloudera upgrades the base engine image. You only need to install libraries and packages once per project. From then on, they are available to any new engine you spawn throughout the lifetime of the project.

Specify the packages you want in a requirements.txt file that lives in your project, then install them using pip/pip3. For example, if you list the following packages in requirements.txt:

beautifulsoup4==4.6.0
seaborn==0.7.1

To install the packages, run:

!pip3 install -r requirements.txt

Cloudera Data Science Workbench does not currently support customization of system packages that require root access. However, Cloudera Data Science Workbench site administrators and project administrators can add libraries and other dependencies to the Docker image in which their engines run. See Customizing Engine Images.

Categories: Cloudera Data Science Workbench | Data Scientists | Installation | Packages | All Categories

Collaborating with Git

Project Environment Variables