Cleaning up after failed jobs
The S3A committers upload data in the tasks, completing the uploads when the job is committed.
Go to the AWS S3 console:
- Find the bucket you are using as a destination of work.
- Select the Management tab.
- Select Add a new lifecycle rule.
Create a rule “cleanup uploads” with no filter, and without any
Configure an “Expiration” action of Clean up incomplete multipart uploads.
- Select a time limit for outstanding uploads, such as 1 Day.
Review and confirm the lifecycle rule
You need to select a limit of how long uploads can be outstanding. For Hadoop applications, this is the maximum time that either an application can write to the same file and the maximum time which a job may take. If the timeout is shorter than either of these, then programs are likely to fail.Once the rule is set, the cleanup is automatic.