AI Workloads

Cloudera Observability provides a centralized dashboard to monitor and manage resources across your workbench AI workloads.

Cloudera AI workloads include monitoring capabilities that provide granular visibility into resource consumption. The AI workloads list page allows you to view allocated and consumed metrics, including CPU, memory, GPU, and GPU memory, directly as tooltips within the workload display. While allocated metrics will be displayed in the table columns, consumption metrics will be displayed as a tooltip. The parent record of the workload displays the aggregate sum of all individual child record metric values and the parent-level metric value. The interface differentiates between allocated and utilized values for each resource type to help you evaluate the efficiency of your running workloads.

The GPU Memory and GPU current utilization percentage metrics are displayed as instantaneous values. The Memory metric is displayed as the peak value captured by the consumption processor.

Workloads

Cloudera AI offloads specific tasks to child workloads, such as workers and Spark executors. To view associated child workloads, click Expand on the parent record. This option is displayed automatically for any parent workload with associated child workloads.

The display aggregates CPU core values in the primary record to provide a holistic view of the total resources used by the primary and its associated child processes. For models, the primary record tracks multiple replicas. Because no actual workload runs on the primary record itself, you should monitor individual secondary workloads for specific execution metrics.

To ensure accurate resource tracking, model replicas do not report parent-level consumption where the total equals the sum of all children. For example, if you have three replicas with one core each, the cell value displays as Total: 3. In scenarios involving application, jobs, and sessions with two cores and three child processes at one core each, the cell value displays as 2 (Total: 5) to distinguish the primary session usage from the aggregate child consumption.

The Cloudera AI Workloads table provides detailed status for each active process.


Column	Description
Name	Name of the AI workload.
Execution ID	The execution ID is a unique identifier for the execution instance of the workload. An information icon beside this ID offers additional details. You can copy the complete execution CRN to the clipboard.
Workbench	Displays the specific workbench where the workload is deployed.
User	Identifies the individual account associated with the workload.
Project	Lists the project name to which the workload belongs.
Type	Workload type: Application, Job, Session, and Model.
Team	Team name (if the Cloudera AI workload belongs to a particular team project).
CPU Cores	Number of CPU cores.
Memory	Total Memory allocated.
GPU	Number of GPU cores.
GPU Memory	Total GPU Memory allocated.
Duration	Workload type running time.
Start Time	Displays the workload start time in Indian Standard Time (IST).

How the workload performance metrics across all workbenches are represented

The Cloudera AI Real Time AI Workloads page provides detailed information for all workbenches, listing multiple workbenches running under the specific Cloudera AI service.

Running AI Workloads
Top 5 Users by AI Workloads Count