AI Workloads
Cloudera Observability provides a centralized dashboard to monitor and manage resources across your workbench AI workloads.
Cloudera AI workloads include monitoring capabilities that provide granular visibility into resource consumption. The AI workloads list page allows you to view allocated and consumed metrics, including CPU, memory, GPU, and GPU memory, directly as tooltips within the workload display. While allocated metrics will be displayed in the table columns, consumption metrics will be displayed as a tooltip. The parent record of the workload displays the aggregate sum of all individual child record metric values and the parent-level metric value. The interface differentiates between allocated and utilized values for each resource type to help you evaluate the efficiency of your running workloads.
The GPU Memory and GPU current utilization percentage metrics are displayed as instantaneous values. The Memory metric is displayed as the peak value captured by the consumption processor.
Workloads
Cloudera AI offloads specific tasks to child workloads, such as workers and Spark executors. To view associated child workloads, click Expand on the parent record. This option is displayed automatically for any parent workload with associated child workloads.
The display aggregates CPU core values in the primary record to provide a holistic view of the total resources used by the primary and its associated child processes. For models, the primary record tracks multiple replicas. Because no actual workload runs on the primary record itself, you should monitor individual secondary workloads for specific execution metrics.
To ensure accurate resource tracking, model replicas do not report parent-level consumption where the total equals the sum of all children. For example, if you have three replicas with one core each, the cell value displays as Total: 3. In scenarios involving application, jobs, and sessions with two cores and three child processes at one core each, the cell value displays as 2 (Total: 5) to distinguish the primary session usage from the aggregate child consumption.
The Cloudera AI Workloads table provides detailed status for each active process.
| Column | Description |
|---|---|
| Name | Name of the AI workload. |
| Execution ID | The execution ID is a unique identifier for the execution instance of the workload. An information icon beside this ID offers additional details. You can copy the complete execution CRN to the clipboard. |
| Workbench | Displays the specific workbench where the workload is deployed. |
| User | Identifies the individual account associated with the workload. |
| Project | Lists the project name to which the workload belongs. |
| Type | Workload type: Application, Job, Session, and Model. |
| Team | Team name (if the Cloudera AI workload belongs to a particular team project). |
| CPU Cores | Number of CPU cores. |
| Memory | Total Memory allocated. |
| GPU | Number of GPU cores. |
| GPU Memory | Total GPU Memory allocated. |
| Duration | Workload type running time. |
| Start Time | Displays the workload start time in Indian Standard Time (IST). |
How the workload performance metrics across all workbenches are represented
The Cloudera AI Real Time AI Workloads page provides detailed information for all workbenches, listing multiple workbenches running under the specific Cloudera AI service.
- Running AI Workloads
- Top 5 Users by AI Workloads Count
