Transparent Encryption Recommendations for Spark
There are various recommendations to consider when configuring HDFS Transparent Encryption for Spark.
- By default, application event logs are stored at
/user/spark/applicationHistory, which can be made into an encryption zone.
- Spark also optionally caches its JAR file at
/user/spark/share/lib(by default), but encrypting this directory is not required.
- Spark does not encrypt shuffle data. To do so, configure the Spark
spark.local.dir(in Standalone mode), to reside on an encrypted disk. For YARN mode, make the corresponding YARN configuration changes.
KMS ACL Configuration for Spark
In the KMS ACL, grant
DECRYPT_EEK permission for the Spark key to the
spark user and any groups that can submit Spark jobs:
<property> <name>key.acl.spark-key.DECRYPT_EEK</name> <value>spark spark-users</value> </property>