Cluster OptimizationPDF version

Troubleshooting with the Job Comparison Feature

Steps for comparing two different runs of the same job, which is especially useful when you notice unexpected changes. For example, when you have a job that consistently completes within a specific amount of time and then it starts taking longer, comparing two runs of the same job enables you to analyze the differences so that you can troubleshoot the cause.

Describes how to compare any two runs of a job using the Job Comparison tool.
  1. Verify that you are logged in to the Cloudera Data Platform.
  2. In a supported browser, log in to Workload XM.
  3. From the Your Enterprise Data Cloud landing page, select the Workload Manager tile.
  4. In the Search field of the Clusters page, enter the name of the cluster whose workloads you want to analyze.
  5. From the navigation panel under Data Engineering, select Jobs and then from the time-range list in the Cluster Summary page, select a time period that meets your requirements.


  6. View the list of jobs that have executed during the selected time period:
    The following example reveals that though the spark-etl job runs often, the last three runs have taken significantly longer. Where, on August 2, the duration was 27 minutes, but on August 3, the duration almost doubled to 51 minutes. The Job Comparison tool will enable you to examine both runs to determine why the duration changed:


  7. Select one of the runs of the spark-etl job, and then in the Jobs detail page, click the Trends tab:


    Up to 30 runs prior to the selected job are displayed. Notice that in the Input and Output columns there are different amounts of data processed by the job. For example, on August 2, the job processed 2.4 GB of data and outputted 1.8 GB. However, on August 3, the job processed 4.2 GB, almost twice as much data, and it outputted 4.6 GB. The Job Comparison tool will enable you to examine both runs to determine why the amount of data changed:


  8. To compare two job runs, select the check boxes adjacent to the job runs you require, in this case the runs for August 2 and August 3 are selected, and then click Compare.
    The Job Comparison page opens displaying more details about each job. For this example's comparison, the tabs that will contain more information are the Structure, Configurations, and the SQL Executions tabs:


  9. The Structure tab page, displays the sub-jobs executed for both runs of the spark-etl job:
    In this example, the job that took 27 minutes only executed 9 sub-jobs and the job that took 51 minutes, almost twice as much time, executed 16 sub-jobs, almost twice as many. Selecting any of the listed sub-jobs displays more details.
    When the Configurations tab was examined, it revealed that the configurations between the two runs of this job were identical, so a configuration change probably did not cause this anomaly.


  10. When the SQL Executions tab was selected, it revealed that twice as many Spark queries executed for the job that took the longest duration:


The analysis from the Job Comparison tells us that either the Spark SQL code was changed by the Job Developer or that the data which the code ran against triggered more of the Spark queries in the job. The Workload XM Job Comparison tool helped narrow the number of causes that produced this anomaly. In this example, the change in job duration appears to be expected so no further troubleshooting is required.