Encrypting Data on S3
Amazon S3 supports a number of encryption mechanisms to better secure the data in S3:
In Server-Side Encryption (SSE), the data is encrypted before it is saved to disk in S3, and decrypted when it is read. This encryption and decryption takes place in the S3 infrastructure, and is transparent to (authenticated) clients.
In Client-Side Encryption (CSE), the data is encrypted and decrypted on the client, that is, inside the AWS S3 SDK. This mechanism isn't supported in Hadoop due to incompatibilities with most applications. Specifically, the amount of decrypted data is often less than the file length, breaking all the code which assumes that the the content of a file is the same size as that stated in directory listings.
Note | |
---|---|
HDP only supports Server-Side Encryption ("SSE") and does not support Client-Side Encryption ("CSE"). |
For this server-side encryption to work, the S3 servers require secret keys to encrypt data, and the same secret keys to decrypt it. These keys can be managed in three ways:
In general, the specific configuration mechanism can be set via the property
fs.s3a.server-side-encryption-algorithm
in core-site.xml
. However,
some encryption options require extra settings. Server Side encryption slightly slows down
performance when reading data from
S3.
It is possible to configure encryption for specific buckets and to mandate encryption for a specific S3 bucket.
Related Links