Cloud Data Access
Also available as:
PDF
loading table of contents...

Deleting Objects on S3

The rm command deletes objects and directories full of objects. If the object store is eventually consistent, fs ls commands and other accessors may briefly return the details of the now-deleted objects; this is an artifact of object stores which cannot be avoided.

If the filesystem client is configured to copy files to a trash directory, the trash directory is in the bucket. The rm operation then takes time proportional to the size of the data. Furthermore, the deleted files continue to incur storage costs.

To make sure that your deleted files are no longer incurring costs, you can do two things:

  • Use the the -skipTrash option when removing files:hadoop fs -rm -skipTrash s3a://bucket1/dataset

  • Use the expunge command to purge any data that has been previously moved to the .Trash directory:hadoop fs -expunge -D fs.defaultFS=s3a://bucket1/

    As the expunge command only works with the default filesystem, you need to use the -D option to make the target object store the default filesystem. This will change the default configuration.