Monitoring ML Workspaces

This topic shows you how to monitor resource usage on your ML workspaces.

Cloudera Machine Learning leverages Prometheus and Grafana to provide a dashboard that allows you to monitor how CPU, memory, storage, and other resources are being consumed by ML workspaces. Prometheus is an internal data source that is auto-populated with resource consumption data for each workspace. Grafana is a monitoring dashboard that allows you to create visualizations for resource consumption data from Prometheus.

Each ML workspace has its own Grafana dashboard.

Required Role: MLAdmin

Without the MLAdmin role, you will not be able to view the Workspace details page.
  1. Log in to the CDP web interface.
  2. Click ML Workspaces.
  3. For the workspace you want to monitor, click Actions > Open Grafana.
  4. Alternatively, in Actions > Overview, click Grafana Dashboard.

    CML provides you with CML Monitoring, a default Grafana dashboard which includes panels on CPU usage, memory usage, running processes, autoscaling, and network I/O on the workspace. You might choose to extend this dashboard or create more panels for other metrics. For more information, see the Grafana documentation.