Securing the S3A Committers
There are a few security considerations when using the S3A committers.
S3 Bucket Permissions
To use an S3A committer, the account/role must have the same AWS permissions as needed to write to the destination bucket.
The multipart upload list/abort operations may be a new addition to the permissions for the active role.
Local Filesystem Security
All the committers use the directories listed in fs.s3a.buffer.dir
to store
staged/buffered output before uploading it to S3.
To ensure that other processes running on the same host do not have access to this data, unique paths should be used per-account.
This requirement holds for all applications working with S3 through the S3A connector.
HDFS Security
The directory and partitioned committers use HDFS to propagate commit information between workers and the job committer.
These intermediate files are saved into a job-specific directory under the path
${fs.s3a.committer.staging.tmp.path}/${user}
where ${user}
is
the name of the user running the job.
The default value of fs.s3a.committer.staging.tmp.path
is
tmp/staging
, Which will be converted at run time to a path under the current
user's home directory, essentially /user/${user}/tmp/staging/${user}/
.
Provided the user's home directory has access restricted to trusted accounts, this intermediate data will be secure.
fs.s3a.committer.staging.tmp.path
is changed to a
different location, then this path will need to be secured to protect pending jobs from
tampering.