Encrypting Data on S3

Amazon S3 supports a number of encryption mechanisms to better secure the data in S3.

  • In Server-Side Encryption (SSE), the data is encrypted before it is saved to disk in S3, and decrypted when it is read. This encryption and decryption takes place in the S3 infrastructure, and is transparent to (authenticated) clients.

  • In Client-Side Encryption (CSE), the data is encrypted and decrypted on the client, that is, inside the AWS S3 SDK. This mechanism is not supported in Hadoop due to incompatibilities with most applications. Specifically, the amount of decrypted data is often less than the file length, breaking all the code which assumes that the the content of a file is the same size as that stated in directory listings.

For this server-side encryption to work, the S3 servers require secret keys to encrypt data, and the same secret keys to decrypt it. These keys can be managed in three ways:

  • SSE-S3: By using Amazon S3-Managed Keys

  • SSE-KMS: By using AWS Key Management Service

  • SSE-C: By using customer-supplied keys

In general, the specific configuration mechanism can be set via the property fs.s3a.server-side-encryption-algorithm in core-site.xml. However, some encryption options require extra settings. Server Side encryption slightly slows down performance when reading data from S3.

It is possible to configure encryption for specific buckets and to mandate encryption for a specific S3 bucket.