Job cleanup is designed to address a number of issues which may surface in cloud storage:
- Slow performance for deletion of directories.
- Timeout when deleting very deep and wide directory trees.
- General resilience to cleanup issues escalating to job failures.
||Skip cleanup of
||Ignore errors during cleanup||
||Delete task attempt directories in parallel||
The algorithm is:
if `mapreduce.fileoutputcommitter.cleanup.skipped`: return if `mapreduce.manifest.committer.cleanup.parallel.delete`: attempt parallel delete of task directories; catch any exception if not `mapreduce.fileoutputcommitter.cleanup.skipped`: delete(`_temporary`); catch any exception if caught-exception and not `mapreduce.fileoutputcommitter.cleanup-failures.ignored`: throw caught-exception
The goal is to perform a fast/scalable delete and throw a meaningful exception if that didn't work.
When working with ABFS and GCS, these settings should normally be left alone. If
somehow errors surface during cleanup, enabling the option to ignore failures will ensure the
job still completes. Disabling cleanup even avoids the overhead of cleanup, but requires a
workflow or manual operation to clean up all
_temporary directories on a