Grid Displays
Cloudera Data Science Workbench supports built-in grid displays of DataFrames across several languages.
Python
Using DataFrames with the pandas package requires per-session
activation:
import pandas as pd
pd.DataFrame(data=[range(1,100)])
For PySpark DataFrames, use pandas and run
df.toPandas()
on a PySpark DataFrame. This will bring
the DataFrame into local memory as a pandas DataFrame.
R
In R, DataFrames will display as grids by default. For example, to view the Iris data set, you would just use:
iris
Similar to PySpark, bringing Sparklyr data into local memory with
as.data.frame
will output a grid
display.sparkly_df %>% as.data.frame
Scala
Calling the display()
function on an existing
dataframe will trigger a collect, much like df.show()
.
val df = sc.parallelize(1 to 100).toDF()
display(df)