Transparent Encryption Recommendations for Spark
There are various recommendations to consider when configuring HDFS Transparent Encryption for Spark.
Recommendations
- By default, application event logs are stored at
/user/spark/applicationHistory
, which can be made into an encryption zone. - Spark also optionally caches its JAR file at
/user/spark/share/lib
(by default), but encrypting this directory is not required. - Spark does not encrypt shuffle data. To do so, configure the Spark
local directory,
spark.local.dir
(in Standalone mode), to reside on an encrypted disk. For YARN mode, make the corresponding YARN configuration changes.
KMS ACL Configuration for Spark
In the KMS ACL, grant DECRYPT_EEK
permission for the Spark key to the
spark
user and any groups that can submit Spark jobs:
<property>
<name>key.acl.spark-key.DECRYPT_EEK</name>
<value>spark spark-users</value>
</property>