Performance metrics of AI workloads by category

You can monitor the performance of workloads using the job, session, model, and application categories and understand reasons for workload failures, particularly if caused by resource exhaustion. To identify resource constraints, you can analyze CPU, Memory, and GPU.

Tracking key metrics such as duration, resource utilization (CPU allocated, GPU allocated, and memory allocated), usage analysis and execution trend helps identify bottlenecks and improve overall efficiency.
  • Jobs: Monitor long-running jobs.
  • Sessions: Monitor long-running sessions.
  • Models: Monitor all active models currently deployed on your workbench. Prioritize time to optimize deployments based on insights gained from monitoring.
  • Applications: Monitor long-running web applications.

Charts on workload category dashboard

Lists the chart name and metrics displayed on each chart.
Chart Metrics
  • Jobs
  • Sessions
  • Models
  • Applications
  • Shows detailed resource consumption by each job, session, model, and application.
  • Filter top jobs, sessions, models, and applications by selecting the following options:
    • Duration (in seconds, minutes, and hours)
    • CPU Allocated (in cores)
    • GPU Allocated (in cores)
    • Memory Allocated (in Mebibyte (MiB) and Gibibyte (GiB)
    • GPU Memory Allocated
  • Hovering over the data displays the name, execution ID, user, and selected filter category name.
  • Job Usage Analysis
  • Session Usage Analysis
  • Model Usage Analysis
  • Application Usage Analysis
  • Categorized into three types: Users, Teams, and Projects.
  • Filter usage by CPU Allocated (in cores), Memory Allocated (in Gigabytes), and GPU Allocated (in cores), and GPU Memory Allocated (in Gigabytes).
  • Hovering over the data shows the selected category name and usage information.
  • Click the usage link to navigate to the AI Workloads page. For information, see Cloudera AI workload metrics and status details.
  • Job Execution Trends
  • Session Usage Trends
  • Model Usage Trends
  • Application Usage Trends
  • Displays the number of total AI workloads and failed workloads within the workbench. The data is displayed based on the selected date filter.
  • Click the Cloudera AI workload number link to navigate to the AI Workloads page. For information, see Cloudera AI workload metrics and status details
  • Job Duration
  • Session Duration
  • Model Duration
  • Application Duration
  • Presents median duration values for the job, session, model, and application, measured in hours, minutes, seconds, and milliseconds.
    • The top median value indicates the duration for all workloads displayed on the bar chart.
    • The individual duration values on the bar chart represent the number of workloads completed within each specified time range.
  • Click the median value link to navigate to the AI Workloads page. For information, see Cloudera AI workload metrics and status details
  • Job Resource Efficiency Analysis
  • Session Resource Efficiency Analysis
  • Model Resource Efficiency Analysis
  • Application Resource Efficiency Analysis
  • Categorized into four types: AI Workloads, Users, Teams, and Projects.
  • Filter usage by CPU Wastage (in cores) and Memory Wastage (in Gigabytes).
  • Hovering over the data shows the selected category name and usage information.
  • Click the usage link to navigate to the AI Workloads page.