Data Visualization
Each language on Cloudera Data Science Workbench has a visualization system that you can use to create plots, including rich HTML visualizations.
Simple Plots
To create a simple plot, run a console in your favorite language and paste in the following code sample:
R
# A standard R plot plot(rnorm(1000)) # A ggplot2 plot library("ggplot2") qplot(hp, mpg, data=mtcars, color=am, facets=gear~cyl, size=I(3), xlab="Horsepower", ylab="Miles per Gallon")
Python
import matplotlib.pyplot as plt import random plt.plot([random.normalvariate(0,1) for i in xrange(1,1000)])
For some libraries such as matplotlib, new plots are displayed as each subsequent command is executed. Therefore, when you run a series of commands, you will see incomplete plots for each intermediate command until the final command is executed. If this is not the desired behavior, an easy workaround is to put all the plotting commands in one Python function.
Saved Images
You can also display images, using a command in the following format:
R
library("cdsw") download.file("https://upload.wikimedia.org/wikipedia/commons/2/29/Minard.png", "/cdn/Minard.png") image("Minard.png")
Python
import urllib from IPython.display import Image urllib.urlretrieve("http://upload.wikimedia.org/wikipedia/commons/2/29/Minard.png", "Minard.png") Image(filename="Minard.png")
HTML Visualizations
Your code can generate and display HTML. To create an HTML widget, paste in the following:
R
library("cdsw") html('<svg><circle cx="50" cy="50" r="50" fill="red" /></svg>')
Python
from IPython.display import HTML HTML('<svg><circle cx="50" cy="50" r="50" fill="red" /></svg>')
IFrame Visualizations
Most visualizations require more than basic HTML. Embedding HTML directly in your console also risks conflicts between different parts of your code. The most flexible way to embed a web resource is using an IFrame:
R
library("cdsw") iframe(src="https://www.youtube.com/embed/8pHzROP1D-w", width="854px", height="510px")
Python
from IPython.display import HTML HTML('<iframe width="854" height="510" src="https://www.youtube.com/embed/8pHzROP1D-w"></iframe>')
You can generate HTML files within your console and display them in IFrames using the /cdn folder. The cdn folder persists and services static assets generated by your engine runs. For instance, you can embed a full HTML file with IFrames.
R
library("cdsw") f <- file("/cdn/index.html") html.content <- paste("<p>Here is a normal random variate:", rnorm(1), "</p>") writeLines(c(html.content), f) close(f) iframe("index.html")
Python
from IPython.display import HTML import random html_content = "<p>Here is a normal random variate: %f </p>" % random.normalvariate(0,1) file("/cdn/index.html", "w").write(html_content) HTML("<iframe src=index.html>")
Cloudera Data Science Workbench uses this feature to support many rich plotting libraries such as htmlwidgets, bokeh, and plotly.
Documenting Your Analysis
Cloudera Data Science Workbench supports Markdown documentation of your code written in comments. This allows you to generate reports directly from valid Python and R code that runs anywhere, even outside Cloudera Data Science Workbench. To add documentation to your analysis, create comments in Markdown format:
R
# Heading # ------- # # This documentation is **important.** # # Inline math: $e^ x$ # # Display math: $$y = \Sigma x + \epsilon$$ print("Now the code!")
Python
# Heading # ------- # # This documentation is **important.** # # Inline math: $e^ x$ # # Display math: $$y = \Sigma x + \epsilon$$ print("Now the code!")
Making Web Services Available
Every console has an environment variable called CDSW_PUBLIC_PORT. Applications can contact any service that listens on that port over the Internet at https://<console-id>.company.com. You can get the <console-id> the environmental variable CDSW_DASHBOARD_ID or the string of random letters and numbers from the console URL.
For example, services in console https://cdsw.company.com/user/project/consoles/xv30miihscnv947b can be reached at https://xv30miihscnv947b.company.com.
Example: A Shiny Application
Create a new, blank project and run an R console. Use the following command to install Shiny to the project.
R
install.packages('shiny')
Create files ui.R and server.R in the project, and copy and paste the contents of the example files provided by Shiny by RStudio:
R
# ui.R shinyUI(bootstrapPage( selectInput(inputId = "n_breaks", label = "Number of bins in histogram (approximate):", choices = c(10, 20, 35, 50), selected = 20), checkboxInput(inputId = "individual_obs", label = strong("Show individual observations"), value = FALSE), checkboxInput(inputId = "density", label = strong("Show density estimate"), value = FALSE), plotOutput(outputId = "main_plot", height = "300px"), # Display this only if the density is shown conditionalPanel(condition = "input.density == true", sliderInput(inputId = "bw_adjust", label = "Bandwidth adjustment:", min = 0.2, max = 2, value = 1, step = 0.2) ) ))R
# server.R shinyServer(function(input, output) { output$main_plot <- renderPlot({ hist(faithful$eruptions, probability = TRUE, breaks = as.numeric(input$n_breaks), xlab = "Duration (minutes)", main = "Geyser eruption duration") if (input$individual_obs) { rug(faithful$eruptions) } if (input$density) { dens <- density(faithful$eruptions, adjust = input$bw_adjust) lines(dens, col = "blue") } }) })Run the following code in the console to load the libraries you will need.
library('cdsw') library('shiny') library('parallel')Now start the Shiny server. Shiny blocks the R process it runs in, so use the parallel package to run it in a separate process.
mcparallel(runApp(host="0.0.0.0", port=8080, launch.browser=FALSE, appDir="/home/cdsw", display.mode="auto"))Finally, create an IFrame widget in the console and point it at the Shiny server.
service.url <- paste("http://", Sys.getenv("CDSW_ENGINE_ID"), ".", Sys.getenv("CDSW_DOMAIN"), sep="") Sys.sleep(5) iframe(src=service.url, width="640px", height="480px")