Actionable insights for optimizing ML workloads

From the Cloudera Observability ML workload details page, you can gain insights into your ML workloads to ensure optimal performance. You can understand workload behavior to fine-tune and debug for improved future execution.

The metric data is divided and displayed in the following tabs:
  • Performance trends: Analyze execution trends to identify deviations from the optimal path.

    For information, see Identify performance trends for ML workload.

  • Resource usage by nodes: Monitor CPU, Memory, GPU and GPU memory usage to ensure that nodes are cost-effective and can handle the workload.

    For information, see Resource usage by nodes.

  • Identify and address workload performance problems: Identify and address performance problems by establishing baselines from health issues that also enable a performance comparison of your workloads.

    For information, see Identifying and addressing performance problems of your ML workloads.

How to access the ML workload details page

You can access the ML workload details page as follows:
  1. Navigate to the ML summary page, and click the workload links in the Jobs, Sessions, Models, and Applications chart widgets.
  2. Navigate to the ML workload list page, in the Name column, select the ML workload.

    The ML workload details page is displayed.