Spark Guide
Also available as:
PDF

(Optional) Configuring Spark for a Kerberos-Enabled Cluster

Spark jobs are submitted to a Hadoop cluster as YARN jobs. The developer creates a Spark application in a local environment, and tests it in a single-node Spark Standalone cluster on their developer workstation.

When a job is ready to run in a production environment, there are a few additional steps if the cluster is Kerberized:

  • The Spark History Server daemon needs a Kerberos account and keytab to run in a Kerberized cluster.

  • To submit Spark jobs in a Kerberized cluster, the account (or person) submitting jobs needs a Kerberos account & keytab.

    • When access is authenticated without human interaction -- as happens for processes that submit job requests -- the process would use a headless keytab. Security risk is mitigated by ensuring that only the service who should be using the headless keytab has the permissions to read it.

    • An end user should use their own keytab when submitting a Spark job.

Setting Up Principals and Keytabs for End User Access to Spark

In the following example, user $USERNAME runs the Spark Pi job in a Kerberos-enabled environment:

su $USERNAME
kinit USERNAME@YOUR-LOCAL-REALM.COM 
cd /usr/hdp/current/spark-client/
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10

Setting Up Service Principals and Keytabs for Processes Submitting Spark Jobs

The following example shows the creation and use of a headless keytab for a spark service user account that will submit Spark jobs on node blue1@example.com:

  1. Create a Kerberos service principal for user spark:

    kadmin.local -q "addprinc -randkey spark/blue1@EXAMPLE.COM"

  2. Create the keytab:

    kadmin.local -q "xst -k /etc/security/keytabs/spark.keytab spark/blue1@EXAMPLE.COM"

  3. Create a spark user and add it to the hadoop group. (Do this for every node of your cluster.)

    useradd spark -g hadoop

  4. Make spark the owner of the newly-created keytab:

    chown spark:hadoop /etc/security/keytabs/spark.keytab

  5. Limit access: make sure user spark is the only user with access to the keytab:

    chmod 400 /etc/security/keytabs/spark.keytab

In the following steps, user spark runs the Spark Pi example in a Kerberos-enabled environment:

su spark  
kinit -kt /etc/security/keytabs/spark.keytab spark/blue1@EXAMPLE.COM
cd /usr/hdp/current/spark-client/
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10