Determining the Cause of Slow and Failed Queries

Identifying the cause of slow query run times and queries that fail to complete.

Describes how to determine the cause of slow and failed queries.

Steps with examples are included that explain how to further investigate and troubleshoot the cause of a slow and failed query.

  1. In a supported browser, log in to Workload XM.
  2. In the Clusters page do one of the following:
    • In the Search field, enter the name of the cluster whose workloads you want to analyze.
    • From the Cluster Name column, locate and click on the name of the cluster whose workloads you want to analyze.
  3. From the navigation panel under Data Engineering, select Jobs.
  4. From the Health Check list in the Jobs page, select Task Wait Time, which filters the list to display a list of jobs with longer than average wait times to execute a process.


  5. To view more details, from the Job column, select a job's name and then click the Health Checks tab.
    The Baseline Health checks are displayed.
  6. From the Health Checks panel, select the Task Wait Time health check.
    The following reveals that for this example the long wait time occurred in the Map Stage of the job process due to insufficient resources:


  7. To display more information about the Map Stage tasks that are experiencing longer than average wait times to execute, click one of the tasks listed under Outlier Tasks.
    The following reveals that for this outlier task example, the Wait Duration time is above average, as confirmed by comparing this time with the time taken when the task successfully completes. Where, the successful value is displayed in the Successful Attempt Duration field and is significantly better than the average time. This indicates that insufficient resources are allocated for this job.