Identifying and addressing performance problems of your AI workloads

Identify inefficient phases of your workloads for optimization and performance tuning.

Describes how to compare any two runs of a AI workload using the Comparison tool.

  1. Verify that you are logged in to the Cloudera Observability web UI and that you selected an environment from the Analytics Environments page.
    1. Log in to Cloudera in a supported browser.
      The Cloudera web interface landing page opens.
    2. From the Your Enterprise Data Cloud landing page, select the Observability tile.
      The Cloudera Observability landing page opens to the main navigation panel.
    3. From the Cloudera Observability Environments page, select the environment required for analysis.

      The Environment navigation panel opens.

  2. From the Environment Name column on the Environments page, locate and click environment name whose workload diagnostic information requires analysis and troubleshooting.
    The Environment navigation panel opens, which hierarchically lists the environment and its services hosted on the selected environment.
  3. Depending on the environment selected, verify that the Cluster Summary page is displayed for the environment's cluster required for analysis.
  4. If not already expanded, from the Environment navigation panel, expand the Cloudera AI environment, and then select the Cloudera AI Workbench.
  5. Optional: From the time-range list, select a time period that meets your requirements.
  6. Click the workload links in the Jobs, Sessions, Models, and Applications chart widgets.
    The Cloudera AI Workload list page opens.
  7. From the Name column, select the Cloudera AI workload.
    The Cloudera AI workload details page opens for the selected Cloudera AI workload category (Jobs, Sessions, Models, or Applications).
  8. To measure the current performance of a workload against the average performance of previous runs, select the Baseline tab.
    The Baseline tab captures and presents metrics for each execution, where one execution represents a single job run per day. These metrics include CPU utilization (allocated versus used), memory usage (allocated versus used), the volume of data processed (read from and written to disk), and execution duration. If the same job is executed on subsequent days and significant deviations in metrics are observed, You can analyze the discrepancies and determine potential causes.

    For information on Baseline metrics, see Baseline health checks.