Securing the S3A Committers
There are a few security considerations when using the S3A committers.
S3 Bucket Permissions
To use an S3A committer, the account/role must have the same AWS permissions as needed to write to the destination bucket.
The multipart upload list/abort operations may be a new addition to the permissions for the active role.
When using S3Guard, the account/role must also have full read/write/delete permissions for the DynamoDB table used.
Local Filesystem Security
All the committers use the directories listed in
fs.s3a.buffer.dir to store
staged/buffered output before uploading it to S3.
To ensure that other processes running on the same host do not have access to this data, unique paths should be used per-account.
This requirement holds for all applications working with S3 through the S3A connector.
The directory and partitioned committers use HDFS to propagate commit information between workers and the job committer.
These intermediate files are saved into a job-specific directory under the path
the name of the user running the job.
The default value of
tmp/staging, Which will be converted at run time to a path under the current
user's home directory, essentially
Provided the user's home directory has access restricted to trusted accounts, this intermediate data will be secure.
fs.s3a.committer.staging.tmp.pathis changed to a different location, then this path will need to be secured to protect pending jobs from tampering.