Resource utilization and performance metrics for Cloudera AI Workbench
Understand how to access Cloudera AI Workbench and explore detailed information on Cloudera AI workloads, including infrastructure analysis for nodes, pods, namespaces, and more.
How to access the Cloudera AI Workbench
- Navigate to the Cloudera AI data service, and select the Cloudera AI Workbench from the workbench list.
- Navigate to the Cloudera AI summary page. In the Workbench Usage Analysis section, from the active workbenches list, click the workspace name link.
How to select a time range
By default, displays workload data for the last 24 hours. You can select a different time range from the time range list. All charts and tables on the dashboard are updated to reflect the workload data for the selected period. For information about the time-range list options, see Specifying a time range.
How the workload performance metrics within the selected workbench are represented
- Total Cloudera AI WORKLOADS: A bar chart illustrates the aggregate count of total Cloudera AI workloads within the selected workbench, categorized by jobs, sessions, applications, and models. Each category is depicted by horizontal bars, with their values summed up. The highest value is prioritized at the top.
- FAILED Cloudera AI WORKLOADS: A bar chart visualizes the combined count of total Cloudera AI workloads and unsuccessful ones within the selected workbench, categorized by jobs, sessions, applications, and models. Each classification is represented by horizontal bars, aggregating their respective values. The highest count of failed workloads takes precedence at the top.
- AVERAGE SYSTEM BUSYNESS: A line chart displays the average consumption of resources for the selected workbench, illustrated as a percentage. This metric indicates whether you over-allocate or under-allocate resources to the selected workbench.
- Cloudera AI Workloads Execution Trends: A trend chart illustrates total and failed workloads according to the average system busyness time range.
- Usage Analysis: A bar chart displays an analysis of individual
workloads, projects, users, and teams within the workbench based on busyness metrics.
- Cloudera AI workloads: Lists top 25 workloads.
- Projects: Lists the projects to which the workload belongs.
- Users: Lists users running workloads based on namespaces.
- Teams: Lists the team if the workload is part of a project
that belongs to a team.
You can filter workloads, projects, users, and teams by the following categories to identify those with higher CPU, GPU, and memory usage: CPU Allocated, GPU Allocated, and Memory Allocated.
These insights help administrators understand which user and the project heavily consumes resources and takes system time. Accordingly, the administrator can monitor and manage projects.
How to analyze resource usage by nodes
- Total: Shows aggregated usage across the nodes, depicting total used and allocated values, with scale-up and scale-down actions indicated.
- Top 5 busiest: Displays the top five nodes that consumed more resources based on the busyness metrics, calculated based on the actual usage values, not the percentage.
- Bottom 5 busiest: Displays the bottom five nodes, calculated based on the actual usage values, not the percentage.
- Custom 5: Allows you to filter custom five nodes. Select five or less than five nodes and click Apply.
Resource utilization within the workbench
Evaluate how effectively computational resources are utilized within the workbench.
- CPU: Provides a historical view of CPU usage with individual workbench granularity. Hover over to view CPU usage in percentage and actual CPU usage compared to available CPU usage.
- GPU: Provides a historical overview of GPU usage at the node level. Hover over to see how much GPU core used compared to the allocated GPU cores.
- Memory: Provides a historical view of memory usage within the workbench. Hover over to view memory usage in percentage and memory used compared to available memory.
- GPU Memory: Provides a historical view of GPU memory usage at the node level. Hover over to view GPU memory usage in bytes and actual GPU memory used compared to the allocated GPU memory.
- Network: Offers a historical perspective on network activity, indicating the bytes received and transmitted over the network interface during processing. Hover over to view the number of bytes received by the selected workbench and the number of bytes transmitted by the workbench in a list form.
- Storage: Displays IOPS and Throughput (Hover over to view information on reads and writes in bytes/seconds).