Pruning Old Data from S3Guard Tables
S3Guard keeps tombstone markers of deleted files. It is good to clean these regularly, just
to keep costs down. This can be done with the hadoop s3guard prune
command.
This can be used to delete entries older than a certain number of days, minutes or hours:
hadoop s3guard prune -days 3 -hours 6 -minutes 15 s3a://guarded-table/ 2018-05-31 15:39:27,981 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(270)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=guarded-table} is initialized. 2018-05-31 15:39:33,770 [main] INFO s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:prune(851)) - Finished pruning 366 items in batches of 25