Web Applications Embedded in Cloudera Data Science Workbench
This topic describes how Cloudera Data Science Workbench allows you to embed web applications for frameworks such as Spark 2, TensorFlow, Shiny, and so on within sessions.
Many data science libraries and processing frameworks include user interfaces to help track progress of your jobs and break down workflows. These are instrumental in debugging and using the platforms themselves. For example, Spark provides a Spark Web UI to monitor running applications and TensorFlow visualizations can be run on TensorBoard. Other web application frameworks such as Shiny and Flask are popular ways for data scientists to display additional interactive analysis in the languages they already know.
Cloudera Data Science Workbench allows you to access these web UIs directly from sessions and jobs. This feature is particularly helpful when you want to monitor and track progress for batch jobs. Even though jobs don't give you access to the interactive workbench console, you can still track long running jobs through the UI. However, note that the UI is only active so long as the job/session is active. If your session times out after 60 minutes (default timeout value), so will the UI.
Cloudera Data Science Workbench exposes web applications for Spark and other machine learning frameworks as described here.
Spark 2 Web UIs (CDSW_SPARK_PORT)
Spark 2 exposes one web UI for each Spark application driver running in Cloudera Data Science Workbench. The UI will be running within the container, on the port specified by the environmental variable CDSW_SPARK_PORT. By default, CDSW_SPARK_PORT is set to 20049. The web UI will exist only as long as a SparkContext is active within a session. The port is freed up when the SparkContext is shutdown.
Spark 2 web UI is available as a tab in the session, or alternatively in browsers at https://spark-<$CDSW_ENGINE_ID>".<$CDSW_DOMAIN>. For a running job, navigate to the Job Overview page and click the History tab. Click on the running job and select Spark UI.
TensorBoard, Shiny, and others (CDSW_APP_PORT or CDSW_READONLY_PORT)
CDSW_APP_PORT and CDSW_READONLY_PORT are environment variables that point to general purpose public ports. Any HTTP services running in containers that bind to CDSW_APP_PORT or CDSW_READONLY_PORT are available in browsers at: http://<$CDSW_ENGINE_ID>.<$CDSW_DOMAIN>. Therefore, TensorBoard, Shiny, Flask or any other web framework accompanying a project can be accessed directly from within a session or job, as long as it is run on CDSW_APP_PORT or CDSW_READONLY_PORT.
CDSW_APP_PORT is meant for applications that grant some level of control to the project, such as access to the active session or terminal. CDSW_READONLY_PORT must be used for applications that grant read-only access to project results.
The host address should be 0.0.0.0.
To access the UI while you are in an active session, click the grid icon in the upper right hand corner of the Cloudera Data Science Workbench web application, and select the UI from the dropdown. For a job, navigate to the job overview page and click the History tab. Click on a job run to open the session output for the job. You can now click the grid icon in the upper right hand corner of the Cloudera Data Science Workbench web application to access the UI for this session.
Limitations with port availability
- one on CDSW_APP_PORT, which can be used for applications that grant some level of control over the project to Contributors and Admins,
- one on CDSW_READONLY_PORT, which can be used for applications that only need to give read-only access to project collaborators,
- and, one on the now-deprecated CDSW_PUBLIC_PORT, which is accessible by all users.
However, by default the editors feature (introduced in version 1.6) runs third-party browser-based editors on CDSW_APP_PORT. Therefore, for projects that are already using browser-based third-party editors, you are left with only 2 other ports to run applications on: CDSW_READONLY_PORT and CDSW_PUBLIC_PORT. Keep in mind the level of access you want to grant users when you are selecting one of these ports for a web application.
Example: A Shiny Application
This example demonstrates how to create and run a Shiny application and view the associated UI while in an active session.
Create a new, blank project and run an R console. Create the files, ui.R and server.R, in the project, and copy the contents of the following example files provided by Shiny by RStudio:
R
# ui.R library(shiny) # Define UI for application that draws a histogram shinyUI(fluidPage( # Application title titlePanel("Hello Shiny!"), # Sidebar with a slider input for the number of bins sidebarLayout( sidebarPanel( sliderInput("bins", "Number of bins:", min = 1, max = 50, value = 30) ), # Show a plot of the generated distribution mainPanel( plotOutput("distPlot") ) ) ))R
# server.R library(shiny) # Define server logic required to draw a histogram shinyServer(function(input, output) { # Expression that generates a histogram. The expression is # wrapped in a call to renderPlot to indicate that: # # 1) It is "reactive" and therefore should re-execute automatically # when inputs change # 2) Its output type is a plot output$distPlot <- renderPlot({ x <- faithful[, 2] # Old Faithful Geyser data bins <- seq(min(x), max(x), length.out = input$bins + 1) # draw the histogram with the specified number of bins hist(x, breaks = bins, col = 'darkgray', border = 'white') }) })Run the following code in the interactive workbench prompt to install the Shiny package, load the library into the engine, and run the Shiny application.
R
install.packages('shiny') library('shiny') runApp(port=as.numeric(Sys.getenv("CDSW_PUBLIC_PORT")), host="127.0.0.1", launch.browser="FALSE")
Finally, click the grid icon in the upper right hand corner of the Cloudera Data Science Workbench web application, and select the Shiny UI, Hello Shiny!, from the dropdown. The UI will be active as long as the session is still running.