Backing up versions and objects

You can use S3 versioning and AWS Backup for S3 to create backup of data stored in S3.

S3 versioning

S3 versioning can keep multiple versions of the objects stored in the S3 bucket. Using the versioning, it is easier to recover the data that was deleted by accident or due to application failure. For more information, see the Using versioning in S3 buckets documentation.

As versioning is not enabled by default, Cloudera recommends enabling the S3 versioning to ensure the recovery of lost data using the Enabling versioning on buckets and Working with versioned S3 buckets documentation.

You can also find how to retrieve object versions from a bucket when versioning is enabled in the Amazon S3 documentation. LifeCycle Rules can be used with versioning to manage the lifecycle of the objects in an S3 bucket. For more information, see the Managing your storage lifecycle documentation.

Ease of recovery
The version ID of the object is required when recovering an object to its previous version. In case the whole bucket or more than one object needs to be recovered, there must be a list of all the versions for each object and the recovery is based on the timestamp of the objects.
Recovery scenarios
If there is a request from a job to remove a certain file/folder, instead of actually removing the object, the current file must have a ‘delete marker’. This ‘delete marker’ serves as a new version on top of the current version of the object, and the ‘delete marker’ is be deleted. In case of a job failure or accidental deletion, it is hard to decide if an object is deleted by accident or it was a ‘delete marker’ file. There is no way to reset all of the objects to a particular time before a job was run. In case someone deletes an older version permanently or a retention policy removes an older version, there is no way to recover the version.
Security and encryption
The same IAM roles are applied for the versioning as for the S3 bucket. Versions are available based on the access of the user to the bucket data. Versioned data also follows the same encryption mechanism as the S3 stored data, such as SSE, SSE-KMS, SSE-c, and so on.
Object level recovery
Versioning happens on an object level: versioning a bucket applies the versioning to all the objects, and recovery with versioning is per object level.
Backup scheduling and location
Lifecycle policies can be configured to manage the scheduling of backups. Only one version is considered as the current version, a new current version can be created by modifying the object. The versions are stored in the same bucket as the objects. These versions can be moved across different buckets. Versioning can be enabled for all regions.

AWS Backup for S3

AWS Backup offers backup management, policy-based and tag-based backup, lifecycle management policies, cross-region and cross-account backup features, and can be used with S3 versioning. When using AWS Backup, Cloudera recommends setting a lifecycle expiration period for the S3 versions as all unexpired versions of the S3 data will be backed up and stored, which can increase cloud cost. For more information about the AWS Backup features and availability, see the What is AWS Backup documentation.

Ease of recovery
Using AWS Backup, you can choose from a list of recovery points that indicate the state of S3 data at that point in time. The whole bucket can be restored or up to 5 prefixes to the recovery point’s data. Data can be restored in the source bucket, another existing bucket or in a newly created bucket, but the bucket must be in the same region as the backup vault.
Recovery scenarios
If a certain file or folder is removed due to a job request, the bucket can be recovered to a point of time before the job started or if the prefix (or path folder) is identified, the file or folder can be restored from that previous recovery point.
If someone permanently deletes some versions and objects from the bucket, the previous recovery point can be used to recover the data. Separate IAM roles for backup and S3 can ensure separation in access to both data.
If someone deletes the recovery points, there is no way to recover that recovery point as the backup is incremental. Vault lock can be used to prevent the deletion of recovery points.
Security and encryption
Different IAM roles defined for backup vault and the S3 bucket can ensure different access permissions. A KMS key is required to encrypt all backed up data, which can be a KMS key you created or a default AWS one.
Object level recovery
AWS Backup allows up to 5 prefixes to recover objects or the whole bucket data is recovered.
AWS Backup creates a backup of all your S3 versions, but restores only the latest version from the version stack at any point in time. This AWS Backup limitation can be harmful when recovering from corrupted data. In this case, the corrupted version needs to be the ‘delete marker’ and restoring from the previous version, or permanently deleting the corrupted version and using the previous one as the current version.
Backup scheduling and location
Creating a backup plan allows you to schedule and prepare for backup when needed in the defined interval. However, on-demand backup can also be created beside the scheduled backup. A bucket can only be backed up in the supported regions of AWS backup. Restoring can be done to the same region bucket as well. AWS Backup is supported for Amazon S3 in all regions except China (Beijing), China (Ningxia), Europe (Spain), Europe (Zurich), Asia Pacific (Hyderabad), Asia Pacific (Melbourne) Region.

Comparison of S3 versioning and AWS Backup

Both S3 versioning and AWS Backup are easy to implement and use for backing up the S3 bucket and with the help of IAM roles a more sufficient security level can be configured for both tools. However, when using S3 versioning, there is no straight way to provide the whole bucket's previous state at a particular time, and an external script is required to restore prefixes/directories to a specific time. For AWS Backup, if a current version (non-delete marker) is present and a previous version needs to be restored, the restore job prefers the current version, so it could mean that in cases of data corruption we fall back to versioning as our solution to restore a particular version. You also need to enable the S3 versioning to use AWS Backup.

The S3 versioning and AWS Backup have different pricing based on usage. For more information, see the AWS S3 Pricing page.