Troubleshooting Failed Data Engineering Jobs (Hadoop Administrators)

Use Workload XM to quickly troubleshoot failed data engineering jobs.

  1. Log in to the Workload XM console at: wxm.cloudera.com, and in Search, type the name of the cluster you want to analyze.
  2. On the Data Engineering Jobs page, click the Health Checks drop-down list, and select Failed to Finish. This filters the list to display a list of jobs that did not complete.



  3. In the list of jobs, click on the Job name to view more detailed information:



  4. On the Jobs details page, click Health Checks to view details for the Failed to Finish health check. It indicates that the failure occurred in the Map stage of job execution:



    Click on Map Stage and then click Execution Details.

  5. In the Summary section of the page, click on the number of failures to see all failed tasks.:



  6. Click on a failed task to see the error message from each failed attempt. In this example, the error message, Task KILL is received. Killing attempt!, is not very descriptive or helpful. To gather more information about the task failure, open the associated log file to further analyze the root cause.