Data Visualization

Each language on Cloudera Data Science Workbench has a visualization system that you can use to create plots, including rich HTML visualizations.

Simple Plots

To create a simple plot, run a console in your favorite language and paste in the following code sample:

R

# A standard R plot 
plot(rnorm(1000)) 

# A ggplot2 plot 
library("ggplot2") 
qplot(hp, mpg, data=mtcars, color=am, 
facets=gear~cyl, size=I(3), 
xlab="Horsepower", ylab="Miles per Gallon")

Python

import matplotlib.pyplot as plt
import random
plt.plot([random.normalvariate(0,1) for i in xrange(1,1000)])

For some libraries such as matplotlib, new plots are displayed as each subsequent command is executed. Therefore, when you run a series of commands, you will see incomplete plots for each intermediate command until the final command is executed. If this is not the desired behavior, an easy workaround is to put all the plotting commands in one Python function.

Saved Images

You can also display images, using a command in the following format:

R

library("cdsw") 

download.file("https://upload.wikimedia.org/wikipedia/commons/2/29/Minard.png", "/cdn/Minard.png") 
image("Minard.png")

Python

import urllib
from IPython.display import Image
urllib.urlretrieve("http://upload.wikimedia.org/wikipedia/commons/2/29/Minard.png", "Minard.png")

Image(filename="Minard.png")

HTML Visualizations

Your code can generate and display HTML. To create an HTML widget, paste in the following:

R

library("cdsw") 
html('<svg><circle cx="50" cy="50" r="50" fill="red" /></svg>')

Python

from IPython.display import HTML
HTML('<svg><circle cx="50" cy="50" r="50" fill="red" /></svg>')

IFrame Visualizations

Most visualizations require more than basic HTML. Embedding HTML directly in your console also risks conflicts between different parts of your code. The most flexible way to embed a web resource is using an IFrame:

R

library("cdsw")
iframe(src="https://www.youtube.com/embed/8pHzROP1D-w", width="854px", height="510px")

Python

from IPython.display import HTML
HTML('<iframe width="854" height="510" src="https://www.youtube.com/embed/8pHzROP1D-w"></iframe>')

You can generate HTML files within your console and display them in IFrames using the /cdn folder. The cdn folder persists and services static assets generated by your engine runs. For instance, you can embed a full HTML file with IFrames.

R

library("cdsw") 
f <- file("/cdn/index.html") 
html.content <- paste("<p>Here is a normal random variate:", rnorm(1), "</p>") 
writeLines(c(html.content), f) 
close(f) 
iframe("index.html")

Python

from IPython.display import HTML
import random

html_content  = "<p>Here is a normal random variate: %f </p>" % random.normalvariate(0,1)

file("/cdn/index.html", "w").write(html_content)
HTML("<iframe src=index.html>")

Cloudera Data Science Workbench uses this feature to support many rich plotting libraries such as htmlwidgets, bokeh, and plotly.

Documenting Your Analysis

Cloudera Data Science Workbench supports Markdown documentation of your code written in comments. This allows you to generate reports directly from valid Python and R code that runs anywhere, even outside Cloudera Data Science Workbench. To add documentation to your analysis, create comments in Markdown format:

R

# Heading
# -------
#
# This documentation is **important.**
#
# Inline math: $e^ x$
#
# Display math: $$y = \Sigma x + \epsilon$$

print("Now the code!")

Python

# Heading
# -------
#
# This documentation is **important.**
#
# Inline math: $e^ x$
#
# Display math: $$y = \Sigma x + \epsilon$$

print("Now the code!")

Making Web Services Available

Every console has an environment variable called CDSW_PUBLIC_PORT. Applications can contact any service that listens on that port over the Internet at https://<console-id>.company.com. You can get the <console-id> the environmental variable CDSW_DASHBOARD_ID or the string of random letters and numbers from the console URL.

For example, services in console https://cdsw.company.com/user/project/consoles/xv30miihscnv947b can be reached at https://xv30miihscnv947b.company.com.

Example: A Shiny Application

Use the following steps to create a new Shiny application.

Create a new, blank project and run an R console. Use the following command to install Shiny to the project.

R

install.packages('shiny') 

Create files ui.R and server.R in the project, and copy and paste the contents of the example files provided by Shiny by RStudio:

R

# ui.R

shinyUI(bootstrapPage(

 selectInput(inputId = "n_breaks",
   label = "Number of bins in histogram (approximate):",
   choices = c(10, 20, 35, 50), 
   selected = 20), 

checkboxInput(inputId = "individual_obs", 
   label = strong("Show individual observations"), 
   value = FALSE), 

checkboxInput(inputId = "density", 
   label = strong("Show density estimate"), 
   value = FALSE), 

plotOutput(outputId = "main_plot", height = "300px"), 

# Display this only if the density is shown 
conditionalPanel(condition = "input.density == true", 
   sliderInput(inputId = "bw_adjust", 
      label = "Bandwidth adjustment:", 
      min = 0.2, max = 2, value = 1, step = 0.2)
    )
 ))
R
# server.R 

shinyServer(function(input, output) {

 output$main_plot <- renderPlot({ 

   hist(faithful$eruptions, 
      probability = TRUE, 
      breaks = as.numeric(input$n_breaks), 
      xlab = "Duration (minutes)", 
      main = "Geyser eruption duration")

   if (input$individual_obs) {
      rug(faithful$eruptions)
   } 

   if (input$density) {
      dens <- density(faithful$eruptions, 
         adjust = input$bw_adjust) 
      lines(dens, col = "blue") 
   }
 }) 
})
Run the following code in the console to load the libraries you will need.
R
library('cdsw') 
library('shiny') 
library('parallel')
Now start the Shiny server. Shiny blocks the R process it runs in, so use the parallel package to run it in a separate process.
R
mcparallel(runApp(host="0.0.0.0", port=8080, launch.browser=FALSE,
    appDir="/home/cdsw", display.mode="auto"))
Finally, create an IFrame widget in the console and point it at the Shiny server.
R
service.url <- paste("http://", Sys.getenv("CDSW_ENGINE_ID"), ".", 
Sys.getenv("CDSW_DOMAIN"), sep="") 
Sys.sleep(5)
iframe(src=service.url, width="640px", height="480px")