Spark supports the following means of encrypting Spark data at rest, and data in transit.
Enabling Spark Encryption Using Cloudera Manager
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
- Open the Cloudera Manager Admin Console and go to the Spark service.
- Click the Configuration tab.
- (Prerequisite) Search for the Spark Authentication property and make sure it has been enabled. If this property is not set, the following settings to enable encryption will not work.
- Search for the Enable Network Encryption property. Use the checkbox to enable encrypted communication between Spark processes belonging to the same application.
- Search for the Enable I/O Encryption property. Use the checkbox to enabled encryption for temporary shuffle and cache files stored by Spark on local disks.
- Click Save Changes to commit the changes.
- Redeploy client configuration.
- Restart stale services (if indicated by Cloudera Manager).
Enabling Spark Encryption on an Unmanaged Cluster
Prerequisite - Before enabling encryption, make sure spark.authenticate is set to true. Without authentication enabled, the following settings to enable encryption will not work.
Enabling Encryption for Shuffle and Cache Files
Configure the following properties to enable encrypted shuffle for Spark on YARN.
|Enable encryption of temporary shuffle and cache files.|
|Shuffle file encryption key size in bits. The valid numbers include 128, 192, and 256.|
Enabling Encryption for Spark RPCs
Configure the following property to enable encryption for Spark RPCs.
|Enable encryption for Spark RPCs.|
If you are using an external shuffle service, configure the following property in the shuffle service configuration to disable unencrypted connections. Note that the external shuffle service is enabled by default in CDH 5.5 and higher.
|false||Disable unencrypted connections for the external shuffle service.|