Accessing Web User Interfaces from Cloudera Data Science Workbench
This topic describes the different ways in which Cloudera Data Science Workbench allows you to access user interfaces for applications such as Cloudera Manager, Hue, and even the transient per-session UIs for frameworks such as Spark 2, TensorFlow, Shiny, and so on.
Cloudera Manager, Hue, and the Spark History Server
Cloudera Data Science Workbench also gives you a way to access your CDH cluster's Cloudera Manager and Hue UIs from within the Cloudera Data Science Workbench application. Spark 2 provides a UI that displays information and logs for completed Spark applications, which is useful for debugging and performance monitoring. This UI, called the History Server, runs on the CDH cluster, on a configurable host and port.
To access these applications, click the grid icon in the upper right hand corner of the Cloudera Data Science Workbench web application, and select the UI you want to visit from the dropdown.
Web UIs Embedded in Jobs and Sessions
Many data science libraries and processing frameworks include user interfaces to help track progress of your jobs and break down workflows. These are instrumental in debugging and using the platforms themselves. For example, Spark provides a Spark Web UI to monitor running applications and TensorFlow visualizations can be run on TensorBoard. Other web application frameworks such as Shiny and Flask are popular ways for data scientists to display additional interactive analysis in the languages they already know.
Cloudera Data Science Workbench allows you to access these web UIs directly from sessions and jobs. This feature is particularly helpful when you want to monitor and track progress for batch jobs. Even though jobs don't give you access to the interactive workbench console, you can still track long running jobs through the UI. However, note that the UI is only active so long as the job/session is active. If your session times out after 60 minutes (default timeout value), so will the UI.
- Spark 2 Web UIs (CDSW_SPARK_PORT)
-
Spark 2 exposes one web UI for each Spark application driver running in Cloudera Data Science Workbench. The UI will be running within the container, on the port specified by the environmental variable CDSW_SPARK_PORT. By default, CDSW_SPARK_PORT is set to 20049. The web UI will exist only as long as a SparkContext is active within a session. The port is freed up when the SparkContext is shutdown.
Spark 2 web UIs are available in browsers at: https://spark-<$CDSW_ENGINE_ID>.<$CDSW_DOMAIN>. To access the UI while you are in an active session, click the grid icon in the upper right hand corner of the Cloudera Data Science Workbench web application, and select Spark UI from the dropdown. For a job, navigate to the job overview page and click the History tab. Click on a job run to open the session output for the job. You can now click the grid icon in the upper right hand corner of the Cloudera Data Science Workbench web application to access the Spark UI for this session.
- TensorBoard, Shiny, and others (CDSW_PUBLIC_PORT)
-
CDSW_PUBLIC_PORT is an environment variable that points to a general purpose public port. By default, CDSW_PUBLIC_PORT is set to port 8080. Any HTTP services running in containers that bind to CDSW_PUBLIC_PORT are available in browsers at: http://<$CDSW_ENGINE_ID>.<$CDSW_DOMAIN>. Therefore, TensorBoard, Shiny, Flask or any other web framework accompanying a project can be accessed directly from within a session or job, as long as it is run on CDSW_PUBLIC_PORT.
To access the UI while you are in an active session, click the grid icon in the upper right hand corner of the Cloudera Data Science Workbench web application, and select the UI from the dropdown. For a job, navigate to the job overview page and click the History tab. Click on a job run to open the session output for the job. You can now click the grid icon in the upper right hand corner of the Cloudera Data Science Workbench web application to access the UI for this session.
Example: A Shiny Application
This example demonstrates how to create and run a Shiny application and view the associated UI while in an active session.
Create a new, blank project and run an R console. Create the files, ui.R and server.R, in the project, and copy the contents of the following example files provided by Shiny by RStudio:
R
# ui.R library(shiny) # Define UI for application that draws a histogram shinyUI(fluidPage( # Application title titlePanel("Hello Shiny!"), # Sidebar with a slider input for the number of bins sidebarLayout( sidebarPanel( sliderInput("bins", "Number of bins:", min = 1, max = 50, value = 30) ), # Show a plot of the generated distribution mainPanel( plotOutput("distPlot") ) ) ))R
# server.R library(shiny) # Define server logic required to draw a histogram shinyServer(function(input, output) { # Expression that generates a histogram. The expression is # wrapped in a call to renderPlot to indicate that: # # 1) It is "reactive" and therefore should re-execute automatically # when inputs change # 2) Its output type is a plot output$distPlot <- renderPlot({ x <- faithful[, 2] # Old Faithful Geyser data bins <- seq(min(x), max(x), length.out = input$bins + 1) # draw the histogram with the specified number of bins hist(x, breaks = bins, col = 'darkgray', border = 'white') }) })Run the following code in the interactive workbench prompt to install the Shiny package, load the library into the engine, and run the Shiny application.
R
install.packages('shiny') library('shiny') runApp(port=as.numeric(Sys.getenv("CDSW_PUBLIC_PORT")), host=Sys.getenv("CDSW_IP_ADDRESS"), launch.browser="FALSE")
Finally, click the grid icon in the upper right hand corner of the Cloudera Data Science Workbench web application, and select the Shiny UI, Hello Shiny!, from the dropdown. The UI will be active as long as the session is still running.