Determining the Cause of Slow and Failed Queries

Identifying the cause of slow query run times and queries that fail to complete.

Describes how to determine the cause of slow and failed queries.

Steps with examples are included that explain how to further investigate and troubleshoot the cause of a slow and failed query.

  1. In a supported browser, log in to the Workload XM web UI by doing the following:
    1. In the web browser URL field, enter the Workload XM URL that you were given by your system administrator and press Enter.
    2. When the Workload XM Log in page opens, enter your Workload XM user name and password access credentials.
    3. Click Log in.
  2. In the Clusters page do one of the following:
    • In the Search field, enter the name of the cluster whose workloads you want to analyze.
    • From the Cluster Name column, locate and click on the name of the cluster whose workloads you want to analyze.
  3. From the time-range list in the Cluster Summary page, select a time period that meets your requirements.
  4. From the Trend widget, select the tab of an engine whose jobs you wish to analyze and then click its Total Jobs value.
    The engine's Jobs page opens.
  5. From the Health Check list in the Jobs page, select Task Wait Time, which filters the list to display a list of jobs with longer than average wait times before the process is executed.

  6. To view more details, from the Job column, select a job's name and then click the Health Checks tab.
    The Baseline Health checks are displayed.
  7. From the Health Checks panel, select the Task Wait Time health check.
    For example, as shown in the following image, the long wait time occurred in the Map Stage of the job process due to insufficient resources:

  8. To display more information about the Map Stage tasks that are experiencing longer than average wait times before they can execute, click one of the tasks listed under Outlier Tasks.
    In the following example, the Task Details show that the task's wait time is above average. When comparing the Wait Duration value with the Successful Attempt Duration value, the task when it does finish has a significantly better than average time. This indicates that insufficient resources are allocated for this job.