Pruning Old Data from S3Guard Tables
S3Guard keeps tombstone markers of deleted files. It is good to clean these regularly, just to keep costs down. This can be done with the hadoop s3guard prune command.
The command can be used to delete entries older than a certain number of days, minutes or hours.
> hadoop s3guard prune -days 3 s3a://guarded-table/
2019-07-31 21:27:37,264 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(321)) - Metadata store DynamoDBMetadataStore{region=eu-west-2, tableName=guarded-table, tableArn=arn:aws:dynamodb:eu-west-2:980678866538:table/guarded-table} is initialized.
2019-07-31 21:27:37,285 [main] INFO s3guard.DynamoDBMetadataStore (DurationInfo.java:<init>(72)) - Starting: Pruning DynamoDB Store
2019-07-31 21:27:37,332 [main] INFO s3guard.DynamoDBMetadataStore (DurationInfo.java:close(87)) - Pruning DynamoDB Store: duration 0:00.047s
2019-07-31 21:27:37,332 [main] INFO s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:innerPrune(1576)) - Finished pruning 366 items in batches of 25