S3Guard: Known Issues
The following known issues have been identified while testing S3Guard.
Credentials in URLs are unsupported
S3Guard cannot be used when the AWS login credentials are in the S3 URL (HADOOP-15422)
Putting AWS credentials in the URLs such as
s3a://AWSID:SECRETKEY/bucket/path
() is very insecure, because the paths are
widely logged: it is very hard to keep the secrets private. Losing the keys can be expensive
and expose all information to which the account has access. S3Guard does not support this
authentication mechanism. Place secrets in Hadoop configuration files, or (Better) JCECKs
credential files.
Error when using trailing / in some hadoop fs commands
Some hadoop fs
operations fail when there is a trailing / in the path,
including the fs -mkdir
command:
$ hadoop fs -mkdir -p s3a://guarded-table/dir/child/ mkdir: get on s3a://guarded-table/dir/child/: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: One or more parameter values were invalid: An AttributeValue may not contain an empty string (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException
There is a straightforward workaround: remove the trailing / if a command fails.
$ hadoop fs -mkdir -p s3a://guarded-table/dir/child/
Fix: remove a trailing / if the fs -mkdir
command fails.
The hadoop s3guard command output contains the error message “hadoop-aws.sh was not found”
This is a warning message about a file which is not found in HDP and which is not actually needed by the s3guard command. It is safe to ignore.
Failure handling of rename() operations
If a rename()
operation fails partway through, including due to
permissions, the S3Guard database is not reliably updated.
If this rename failed due to a network problem it's moot: if an application can't connect to S3, then DynamoDB will inevitably be unreachable; updates will be impossible. It can also surface if the bucket has been set up with complex permissions where not all callers have full write (including delete) access to the bucket. S3Guard, (and the S3A connector), prefers unrestricted write access to an entire R/W bucket.