Data Visualization
Each language on Cloudera Data Science Workbench has a visualization system that you can use to create plots, including rich HTML visualizations.
Simple Plots
To create a simple plot, run a console in your favorite language and paste in the following code sample:
R
# A standard R plot plot(rnorm(1000)) # A ggplot2 plot library("ggplot2") qplot(hp, mpg, data=mtcars, color=am, facets=gear~cyl, size=I(3), xlab="Horsepower", ylab="Miles per Gallon")
Python
import matplotlib.pyplot as plt import random plt.plot([random.normalvariate(0,1) for i in xrange(1,1000)])
Cloudera Data Science Workbench processes each line of code individually (unlike notebooks that process code per-cell). This means if your plot requires multiple commands, you will see incomplete plots in the workbench as each line is processed.
To get around this behavior, wrap all your plotting commands in one Python function. Cloudera Data Science Workbench will then process the function as a whole, and not as individual lines. You should then see your plots as expected.
Saved Images
You can also display images, using a command in the following format:
R
library("cdsw") download.file("https://upload.wikimedia.org/wikipedia/commons/2/29/Minard.png", "/cdn/Minard.png") image("Minard.png")
Python
import urllib from IPython.display import Image urllib.urlretrieve("http://upload.wikimedia.org/wikipedia/commons/2/29/Minard.png", "Minard.png") Image(filename="Minard.png")
HTML Visualizations
Your code can generate and display HTML. To create an HTML widget, paste in the following:
R
library("cdsw") html('<svg><circle cx="50" cy="50" r="50" fill="red" /></svg>')
Python
from IPython.display import HTML HTML('<svg><circle cx="50" cy="50" r="50" fill="red" /></svg>')
Scala
Cloudera Data Science Workbench allows you to build visualization libraries for Scala using jvm-repr. The following example demonstrates how to register a custom HTML representation with the "text/html" mimetype in Cloudera Data Science Workbench. This output will render as HTML in your workbench session.
//HTML representation case class HTML(html: String) //Register a displayer to render html Displayers.register(classOf[HTML], new Displayer[HTML] { override def display(html: HTML): java.util.Map[String, String] = { Map( "text/html" -> html.html ).asJava } }) val helloHTML = HTML("<h1> <em> Hello World </em> </h1>") display(helloHTML)
IFrame Visualizations
Most visualizations require more than basic HTML. Embedding HTML directly in your console also risks conflicts between different parts of your code. The most flexible way to embed a web resource is using an IFrame:
R
library("cdsw") iframe(src="https://www.youtube.com/embed/8pHzROP1D-w", width="854px", height="510px")
Python
from IPython.display import HTML HTML('<iframe width="854" height="510" src="https://www.youtube.com/embed/8pHzROP1D-w"></iframe>')
You can generate HTML files within your console and display them in IFrames using the /cdn folder. The cdn folder persists and services static assets generated by your engine runs. For instance, you can embed a full HTML file with IFrames.
R
library("cdsw") f <- file("/cdn/index.html") html.content <- paste("<p>Here is a normal random variate:", rnorm(1), "</p>") writeLines(c(html.content), f) close(f) iframe("index.html")
Python
from IPython.display import HTML import random html_content = "<p>Here is a normal random variate: %f </p>" % random.normalvariate(0,1) file("/cdn/index.html", "w").write(html_content) HTML("<iframe src=index.html>")
Cloudera Data Science Workbench uses this feature to support many rich plotting libraries such as htmlwidgets, Bokeh, and Plotly.
Grid Displays
Cloudera Data Science Workbench supports native grid displays of DataFrames across several languages.
Python
import pandas as pd pd.DataFrame(data=[range(1,100)])
For PySpark DataFrames, use pandas and run df.toPandas() on a PySpark DataFrame. This will bring the DataFrame into local memory as a pandas DataFrame.
R
In R, DataFrames will display as grids by default. For example, to view the Iris data set, you would just use:
iris
sparkly_df %>% as.data.frame
Scala
Calling the display() function on an existing dataframe will trigger a collect, much like df.show().
val df = sc.parallelize(1 to 100).toDF() display(df)
Documenting Your Analysis
Cloudera Data Science Workbench supports Markdown documentation of your code written in comments. This allows you to generate reports directly from valid Python and R code that runs anywhere, even outside Cloudera Data Science Workbench. To add documentation to your analysis, create comments in Markdown format:
R
# Heading # ------- # # This documentation is **important.** # # Inline math: $e^ x$ # # Display math: $$y = \Sigma x + \epsilon$$ print("Now the code!")
Python
# Heading # ------- # # This documentation is **important.** # # Inline math: $e^ x$ # # Display math: $$y = \Sigma x + \epsilon$$ print("Now the code!")