Grid Displays

Cloudera Data Science Workbench supports native grid displays of DataFrames across several languages.

Python

Using DataFrames with the pandas package requires per-session activation:

import pandas as pd
pd.DataFrame(data=[range(1,100)])

For PySpark DataFrames, use pandas and run df.toPandas() on a PySpark DataFrame. This will bring the DataFrame into local memory as a pandas DataFrame.

In R, DataFrames will display as grids by default. For example, to view the Iris data set, you would just use:

iris

Similar to PySpark, bringing Sparklyr data into local memory with as.data.frame will output a grid display.

sparkly_df %>% as.data.frame

Scala

Calling the display() function on an existing dataframe will trigger a collect, much like df.show().

val df = sc.parallelize(1 to 100).toDF()
display(df)