Installing Packages and Libraries
Cloudera Data Science Workbench engines are preloaded with a few common packages and libraries for R, Python, and Scala. However, a key feature of Cloudera Data Science Workbench is the ability of different projects to install and use libraries pinned to specific versions, just as you would on your local computer..
You can install additional libraries and packages from the workbench, either using the command prompt or terminal. To install a package:
- Launch a session in your favorite language.
- At the command prompt in the bottom right, enter the command to install:
R
# Install from CRAN install.packages("ggplot2") # Install using devtools install.packages('devtools') library(devtools) install_github("hadley/ggplot2")
Python 2
# Installing from console using ! shell operator and pip: !pip install beautifulsoup # Installing from terminal pip install beautifulsoup
Python 3
# Installing from console using ! shell operator and pip3: !pip3 install beautifulsoup # Installing from terminal pip3 install beautifulsoup
Generally, Cloudera recommends you install all packages locally into your project. This will ensure you have the exact versions you want and that these libraries will not be upgraded when Cloudera upgrades the base engine image. You only need to install libraries and packages once per project. From then on, they are available to any new engine you spawn throughout the lifetime of the project.
beautifulsoup4==4.6.0 seaborn==0.7.1To install the packages, run:
!pip3 install -r requirements.txt
Cloudera Data Science Workbench does not currently support customization of system packages that require root access. However, Cloudera Data Science Workbench site administrators and project administrators can add libraries and other dependencies to the Docker image in which their engines run. See Customizing Engine Images.