Identifying and addressing performance problems of your Cloudera AI workloads

Identify inefficient phases of your workloads for optimization and performance tuning.

Describes how to compare any two runs of an Cloudera AI workload using the Comparison tool.

Verify that you are logged in to the Cloudera Observability web UI and that you selected an environment from the Analytics Environments page.
1. Log in to Cloudera in a supported browser.
  The Cloudera web interface landing page opens.
2. From the Your Enterprise Data Cloud landing page, select the Observability tile.
  The Cloudera Observability landing page opens to the main navigation panel.
3. From the Cloudera Observability Environments page, select the environment required for analysis.
  
  tip
  You can reduce the number of environments by selecting your environment's type from the Environments list.
  
  The Environment navigation panel opens.
From the Environment Name column on the Environments page, locate and click environment name whose workload diagnostic information requires analysis and troubleshooting.
The Environment navigation panel opens, which hierarchically lists the environment and its services hosted on the selected environment.
Depending on the environment selected, verify that the Cluster Summary page is displayed for the environment's cluster required for analysis.
If not already expanded, from the Environment navigation panel, expand the Cloudera AI environment, and then select the Cloudera AI Workbench.
Optional: From the time-range list, select a time period that meets your requirements.
Click the workload links in the Jobs, Sessions, Models, and Applications chart widgets.
The Cloudera AI Workload list page opens.
From the Name column, select the Cloudera AI workload.
The Cloudera AI workload details page opens for the selected Cloudera AI workload category (Jobs, Sessions, Models, or Applications).
To measure the current performance of a workload against the average performance of previous runs, select the Baseline tab.
The Baseline tab captures and presents metrics for each execution, where one execution represents a single job run per day. These metrics include CPU utilization (allocated), memory usage (allocated), and execution duration. If the same job is executed on subsequent days and significant deviations in metrics are observed, You can analyze the discrepancies and determine potential causes.
For information on Baseline metrics, see Baseline health checks.
note
The Baseline and Trends tabs are not displayed for the Session workload type. The session is a standalone execution. Therefore, you cannot link a session with its past execution.
To troubleshoot performance-related issues between two different runs of the same workload, do the following:
1. From the workload details page, select the Trends tab.
2. Scroll down and from the table, select the check boxes adjacent to the workload job runs that you require, such as the latest run with a run from a week ago, and then click Compare.
  The Execution Comparison page opens, displaying more details about the selected workload.
3. From the Details section, select the Basics tab and review the workload details executed for both of your selected workload runs.
4. Select the Metrics tab and compare statistical differences between the selected workload runs. For example, you can identify differences in the workload run durations.