Summary of all Cloudera AI Workbenches in Cloudera Observability
You can monitor a summary of all workbenches and track usage trends for both workloads and infrastructure in Cloudera Observability. This view helps you decide which workbench to investigate further by identifying potential issues based on cluster activity, peak and low times, deviations, and other indicators.
How to select a time range
By default, displays workload data for the last 24 hours. You can select a different time range from the time range list. All charts and tables on the summary dashboard are updated to reflect the workload data for the selected period. For information about the time-range list options, see Specifying a time range.
How the workload performance metrics across all workbenches are represented
The Cloudera AI summary page provides detailed information for all workbenches, listing multiple workbenches running under the specific Cloudera AI service.
- Total Cloudera AI WORKLOADS: A bar chart illustrates the aggregate count of total Cloudera AI workloads across all the workbenches within the Cloudera AI environment, categorized by jobs, sessions, applications, and models. Each category is depicted by horizontal bars, with their values summed up. The highest value is prioritized at the top.
- FAILED Cloudera AI WORKLOADS: A bar chart visualizes the combined count of total Cloudera AI workloads and unsuccessful ones. Jobs, sessions, applications, and model categories show the count of failed Cloudera AI workloads. The highest count of failed workloads takes precedence at the top.
- AVERAGE SYSTEM BUSYNESS: A line chart displays the average consumption of resources across all your workbenches at the Cloudera AI level, illustrated as a percentage. This metric helps you understand concurrent processes, CPU utilization, memory usage, network traffic, and storage access, indicating whether you over-allocate or under-allocate resources to your workbenches. Hover over a bar to view the average percentage of system busyness activity.
How to analyze workbench usage
- Top 5 Allocated: Displays the top five allocated workbenches, calculated based on the busyness usage, not the percentage. If the number of active workbenches is fewer than five based on the busyness metrics, only those workbenches are listed.
- Bottom 5 Allocated: Displays the bottom five workbenches, calculated based on the busyness usage, not the percentage.
- Custom 5: Allows you to filter custom five workbenches. Select five or less than five workbenches and click Apply.
Resource utilization across workbenches
- CPU: Provides a historical overview of CPU usage at the workbench level. Hover over to see CPU usage as a percentage and the actual CPU utilized compared to the available CPU.
- GPU: Provides a historical overview of GPU usage at the workbench level. Hover over to see how much GPU core used compared to the allocated GPU cores.
- Memory: Provides a historical view of memory usage with workbench granularity. Hover over to view memory usage in percentage and actual memory used compared to the available memory.
- GPU Memory: Provides a historical view of GPU memory usage with workbench granularity. Hover over to view GPU memory usage in bytes and actual GPU memory used compared to the allocated GPU memory.
- Network: Measures in Mebibyte (MiB) and Gibibyte (GiB). Offers a historical perspective on network activity. Hover over to view the number of bytes received by all workbenches and bytes transmitted by all workbenches in a list form.
- Storage: Displays IOPS and Throughput. Hover over to view
information on reads and writes in bytes/seconds.
- IOPS: The IOPS metric shows how many read and write operations a storage device can perform per second. A single operation is performed on one Hard Disk Drive (HDD) normally has 512 B or 4 KB blocks, whereas modern Solid State Drive (SSD) expose storage memory in pages joined in blocks that can reach 512 KB in size.
- Throughput: Storage throughput (data transfer rate) measures the data transfer to and from the storage device per second. Normally, throughput is measured in Megabytes. Throughput is closely related to IOPS and block size.