Transparent Encryption Recommendations for Spark

There are various recommendations to consider when configuring HDFS Transparent Encryption for Spark.

Recommendations

  • By default, application event logs are stored at /user/spark/applicationHistory, which can be made into an encryption zone.
  • Spark also optionally caches its JAR file at /user/spark/share/lib (by default), but encrypting this directory is not required.
  • Spark does not encrypt shuffle data. To do so, configure the Spark local directory, spark.local.dir (in Standalone mode), to reside on an encrypted disk. For YARN mode, make the corresponding YARN configuration changes.

KMS ACL Configuration for Spark

In the KMS ACL, grant DECRYPT_EEK permission for the Spark key to the spark user and any groups that can submit Spark jobs:

<property>
  <name>key.acl.spark-key.DECRYPT_EEK</name>
  <value>spark spark-users</value>
</property>