Identifying and addressing performance problems of your Cloudera AI workloads
Identify inefficient phases of your workloads for optimization and performance tuning.
Describes how to compare any two runs of an Cloudera AI workload using the Comparison tool.
-
Verify that you are logged in to the Cloudera Observability web UI and that you
selected an environment from the Analytics
Environments page.
Show Me How
-
Log in to Cloudera in a supported
browser.
The Cloudera web interface landing page opens.
-
From the Your Enterprise Data Cloud landing
page, select the Observability tile.
The Cloudera Observability landing page opens to the main navigation panel.
-
From the Cloudera Observability
Environments page, select the environment required
for analysis.
The Environment navigation panel opens.
-
Log in to Cloudera in a supported
browser.
-
From the Environment Name column on the Environments page, locate and click
environment name whose workload diagnostic information requires analysis and
troubleshooting.
The Environment navigation panel opens, which hierarchically lists the environment and its services hosted on the selected environment.
- Depending on the environment selected, verify that the Cluster Summary page is displayed for the environment's cluster required for analysis.
- If not already expanded, from the Environment navigation panel, expand the Cloudera AI environment, and then select the Cloudera AI Workbench.
- Optional: From the time-range list, select a time period that meets your requirements.
-
Click the workload links in the Jobs,
Sessions, Models, and
Applications chart widgets.
The Cloudera AI Workload list page opens.
-
From the Name column, select the Cloudera AI workload.
The Cloudera AI workload details page opens for the selected Cloudera AI workload category (Jobs, Sessions, Models, or Applications).
-
To measure the current performance of a workload against the average
performance of previous runs, select the Baseline
tab.
The Baseline tab captures and presents metrics for each execution, where one execution represents a single job run per day. These metrics include CPU utilization (allocated), memory usage (allocated), and execution duration. If the same job is executed on subsequent days and significant deviations in metrics are observed, You can analyze the discrepancies and determine potential causes.
For information on Baseline metrics, see Baseline health checks.
-
To troubleshoot performance-related issues between two different runs of the
same workload, do the following:
- From the workload details page, select the Trends tab.
-
Scroll down and from the table, select the check boxes adjacent to the
workload job runs that you require, such as the latest run with a run
from a week ago, and then click Compare.
The Execution Comparison page opens, displaying more details about the selected workload.
- From the Details section, select the Basics tab and review the workload details executed for both of your selected workload runs.
- Select the Metrics tab and compare statistical differences between the selected workload runs. For example, you can identify differences in the workload run durations.