Spark Encryption

Spark supports the following means of encrypting Spark data at rest, and data in transit.

Enabling Encrypted Shuffle for Spark Running on YARN

The following properties must be configured to enable encrypted shuffle for Spark on YARN. Spark does not support encryption for cached data or intermediate files that spill to the local disk.

To use Cloudera Manager to configure these properties, see Enabling Spark Encryption Using Cloudera Manager. To use the command line instead, add the properties listed here to /etc/spark/conf/spark-defaults.conf on the host that launches Spark jobs.

Property Description

spark.shuffle.encryption.enabled

Enable encrypted communication when authentication is enabled. This option is currently only supported by the block transfer service.

spark.shuffle.encryption.keySizeBits

Shuffle file encryption key size in bits. The valid numbers include 128, 192, and 256.

spark.shuffle.encryption.keygen.algorithm

The algorithm to generate the key used by shuffle file encryption.

spark.shuffle.crypto.cipher.transformation

Cipher transformation for shuffle file encryption. Currently only AES/CTR/NoPadding is supported.

spark.shuffle.crypto.cipher.classes

Comma-separated list of crypto cipher classes that implement AES/CTR/NoPadding. A crypto cipher implementation encapsulates encryption and decryption details. The first available implementation in this list is used.

spark.shuffle.crypto.secure.random.classes

Comma-separated list of secure random classes that implement a secure random algorithm. Use this when generating the Initialization Vector for crypto input/output streams. The first available implementation in this list is used.

Enabling SASL Encryption for Spark RPCs

If you are using an external shuffle service, configure the following property in the shuffle service configuration to disable unencrypted connections. This setting will only work for connections from services that use SASL for authentication. Note that the external shuffle service is enabled by default in CDH 5.5 and higher.
Property Default Value Description

spark.network.sasl.serverAlwaysEncrypt

false Disable unencrypted connections for the external shuffle service.

If you are using the block transfer service, configure the following property to enable SASL encryption for Spark RPCs. This setting is supported only when authentication using a secret key is already enabled.

Property Default Value Description

spark.authenticate.enableSaslEncryption

false

Enable encrypted communication for the block transfer service.

To use Cloudera Manager to configure these properties, see Enabling Spark Encryption Using Cloudera Manager. To use the command line instead, add the properties listed here to /etc/spark/conf/spark-defaults.conf on the host that launches Spark jobs.

Enabling Spark Encryption Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

  1. Open the Cloudera Manager Admin Console and go to the Spark service.
  2. Click the Configuration tab.
  3. Select Scope > Gateway.
  4. Select Category > Advanced.
  5. Edit the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf property and add configuration properties for the feature you want to enable.
  6. Click Save Changes to commit the changes.
  7. Restart the Spark service.