Steps for comparing two different runs of the same job, which is especially useful
when you notice unexpected changes, such as when you have a job that consistently completes
within a specific amount of time and then it starts taking longer. Comparing two runs of the
same job enables you to analyze the performance and differences so that you can troubleshoot
the cause.
Describes how to compare any two runs of a job using the Job Comparison tool.
The following procedure uses examples from a Spark engine to explain how to
further investigate and troubleshoot.
Verify that you are logged in to the Workload XM web UI.
In the URL field of a supported web browser, enter the Workload XM URL that
you were given by your system administrator and press
Enter.
When the Workload XM Log in page opens, enter your Workload XM user name and password access credentials.
Click Log in.
In the Search field of the Clusters page, enter the name
of the cluster whose workloads you want to analyze.
From the time-range list in the Cluster Summary page, select a time period that
meets your requirements.
In the Trend widget, select the tab of an engine whose
jobs you want to analyze and then click its Total Jobs
value.
The engine's Jobs page opens.
Examine the list of jobs that have executed during the selected time period and
manually compare runs of the same job.
For example, as shown in the following image, when manually comparing the last
two runs of the Log Processor job we can see that there are
duration differences. In this example, the older run had a Task duration skew
health issue, which appears to be fixed:
List and display details of all the runs of a specific job of interest by
selecting one of the job runs and then in its jobs details page, click the
Trends tab.
In the following example, notice how the amount of Input and Output data
changes between runs. The Job Comparison tool enables you to examine more
details about two runs to determine why the amount of data changed. In this case
we will compare a run with no health issues with the last run of the job:
To compare two job runs, select the check boxes adjacent to the job runs you
require and then click Compare.
The Job Comparison page opens displaying more details about each job.
For this example's comparison, the tabs that contain more information about
the job runs are the Structure, SQL
Executions, and the Metrics tabs:
Display and compare the sub-jobs executed for both of your selected job runs by
selecting the Structure tab.
For example, as shown in the following image, the last run of the job (with
health issues) completed in 1minute and 38 seconds and executed 9 sub-jobs and
the run that had no health issues took 1 minute and 20 seconds but only executed
5 sub-jobs. Clicking any of the listed sub-jobs displays more details.
Display and compare what Spark SQL was run and how long they ran for both of
your selected job runs by selecting the SQL Executions
tab.
For example, as shown in the following image, more Spark SQL queries were
performed on the data in the last job run.
Display and compare what metrics were performed on both of your selected job
runs by selecting the Metrics tab.
For example, as shown in the following image, more input records were digested
in the last job run.