Monitor Cloudera AI Workbench and workload performance using Cloudera Observability
Cloudera Observability is integrated with Cloudera AI to obtain detailed information about the resources used in the Cloudera AI service at infrastructure and workload level.
From the Cloudera AI Workbench Summary dashboard, you can monitor multiple Cloudera AI Workbenches at the Cloudera AI service level. From the Cloudera AI Workbench dashboard, you can monitor, optimize, and troubleshoot Cloudera AI workloads such as sessions, jobs, models, and applications, categorized by the user, team, and project.
Cloudera Observability Essential (free) & Premium (paid) is now available for Cloudera AI (CAI) with a revamped OpenTelemetry (OTel) deployment architecture. This framework provides a scalable data collection solution by integrating OTel agents to monitor AI Workloads, Workbenches, and infrastructure metrics. You can use these insights to optimize workloads and troubleshoot failures across AI Workloads and Cloudera AI Workbenches.
How to enable Cloudera AI feature in Cloudera Observability
- Enable the outbound traffic. For information, see AWS outbound network access destinations.
- For workbenches created with or upgraded to the Cloudera AI
version 2.0.55-h1000-b5 or higher, the Cloudera Observability components
are installed automatically.
For information on creating a new workbench, see Provisioning Cloudera AI Workbenches in Cloudera AI documentation.
- Cloudera Observability introduces a dedicated ‘obs-infra’ node group to allow isolation from Cloudera AI worker nodes. Nodes in this new node group are configured with 4 CPU cores and 8 GB of memory and will incur an additional infrastructure cost.
