Monitoring Cloudera AI Workbenches

This topic shows you how to monitor resource usage on your Cloudera AI Workbenches.

Cloudera AI leverages Prometheus and Grafana to provide a dashboard that allows you to monitor how CPU, memory, storage, and other resources are being consumed by Cloudera AI Workbenches. Prometheus is an internal data source that is auto-populated with resource consumption data for each workbench. Grafana is a monitoring dashboard that allows you to create visualizations for resource consumption data from Prometheus.

Each Cloudera AI Workbench has its own Grafana dashboard.

Required Role: MLAdmin

You need the MLAdmin role to view the Workbench details page.

  1. Log in to the Cloudera AI web interface.
  2. Click Cloudera AI Workbenches.
  3. For the workbench you want to monitor, click Actions > Open Grafana.
Cloudera AI provides you with several default Grafana dashboards:
  • K8s Cluster: Shows cluster health, deployments, and pods
  • K8s Containers: Shows pod info, cpu and memory usage
  • K8s Node: Shows node CPU and memory usage, disk usage and network conditions
  • Models: Shows response times, requests per second, CPU and memory usage for model replicas.
You might choose to add new dashboards or create more panels for other metrics. For more information, see the Grafana documentation.