Spark Authentication
Spark currently support two methods of authentication. Authentication can be configured using Kerberos or using a shared secret. When using Spark on YARN, Cloudera recommends using Kerberos authentication since it is stronger security measure.
Configuring Kerberos Authentication for Spark Using the Command Line
Create the Spark Principal and Keytab File
- Create the spark principal and spark.keytab file:
kadmin: addprinc -randkey spark/fully.qualified.domain.name@YOUR-REALM.COM kadmin: xst -k spark.keytab spark/fully.qualified.domain.name
- Move the file into the Spark configuration directory and restrict its access exclusively to the spark user:
$ mv spark.keytab /etc/spark/conf/ $ chown spark /etc/spark/conf/spark.keytab $ chmod 400 /etc/spark/conf/spark.keytab
For more details on creating Kerberos principals and keytabs, see Step 4: Create and Deploy the Kerberos Principals and Keytab Files.
Configure the Spark History Server to Use Kerberos
Open the Spark configuration file /etc/spark/conf/spark-env.sh file and add the following properties:
SPARK_HISTORY_OPTS=-Dspark.history.kerberos.enabled=true \ -Dspark.history.kerberos.principal=spark/fully.qualified.domain.name@YOUR-REALM.COM \ -Dspark.history.kerberos.keytab=/etc/spark/conf/spark.keytab
Running Spark Applications on a Secure Cluster
You can submit compiled Spark applications with the spark-submit script. Specify the following additional command-line options when running the spark-submit script on a secure cluster using the form: --option value.
Option | Description |
---|---|
--keytab | The full path to the file that contains the keytab for the principal. This keytab is copied to the node running the ApplicationMaster using the Secure Distributed Cache, for periodically renewing the login tickets and the delegation tokens. For information on setting up the principal and keytab, see Configuring a Cluster with Custom Kerberos Principalsand Spark Authentication. |
--principal | Principal to be used to log in to the KDC, while running on secure HDFS. |
--proxy-user | This property allows you to use the spark-submit script to impersonate client users when submitting jobs. |
Configuring Spark Authentication With a Shared Secret Using Cloudera Manager
Minimum Required Role: Security Administrator (also provided by Full Administrator)
Authentication using a shared secret can be configured using the spark.authenticate configuration parameter. The authentication process is essentially a handshake between Spark and the other party to ensure they have the same shared secret and can be allowed to communicate. If the shared secret does not match, they will not be allowed to communicate.
- Go to the tab.
- In the Search field, type spark authenticate to find the Spark Authentication settings.
- Check the checkbox for the Spark Authentication property.
- Click Save Changes.