Configuring Apache Spark
Also available as:
PDF

Set up access for submitting jobs

Use the following steps to set up access for submitting Spark jobs on a Kerberos-enabled cluster.

Accounts that submit jobs on behalf of other processes must have a Kerberos account and keytab. End users should use their own keytabs (instead of using a headless keytab) when submitting a Spark job. The following sections describe both scenarios.

Set Up Access for an Account

When access is authenticated without human interaction (as happens for processes that submit job requests), the process uses a headless keytab. Security risk is mitigated by ensuring that only the service that should be using the headless keytab has permission to read it.

The following example creates a headless keytab for a spark service user account that will submit Spark jobs on node blue1@example.com:

  1. Create a Kerberos service principal for user spark:

    kadmin.local -q "addprinc -randkey spark/blue1@EXAMPLE.COM"

  2. Create the keytab:
    kadmin.local -q "xst -k /etc/security/keytabs/spark.keytab
                  spark/blue1@EXAMPLE.COM"
  3. For every node of your cluster, create a spark user and add it to the hadoop group:

    useradd spark -g hadoop

  4. Make spark the owner of the newly created keytab:

    chown spark:hadoop /etc/security/keytabs/spark.keytab

  5. Limit access by ensuring that user spark is the only user with access to the keytab:

    chmod 400 /etc/security/keytabs/spark.keytab

In the following example, user spark runs the Spark Pi example in a Kerberos-enabled environment:

su spark  
kinit -kt /etc/security/keytabs/spark.keytab spark/blue1@EXAMPLE.COM
cd /usr/hdp/current/spark-client/
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn-cluster \
    --num-executors 1 \
    --driver-memory 512m \
    --executor-memory 512m \
    --executor-cores 1 \
    lib/spark-examples*.jar 10

Set Up Access for an End User

Each person who submits jobs must have a Kerberos account and their own keytab; end users should use their own keytabs (instead of using a headless keytab) when submitting a Spark job. This is a best practice: submitting a job under the end user keytab delivers a higher degree of audit capability.

In the following example, end user $USERNAME has their own keytab and runs the Spark Pi job in a Kerberos-enabled environment:

su $USERNAME
kinit USERNAME@YOUR-LOCAL-REALM.COM 
cd /usr/hdp/current/spark-client/
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn-cluster \
    --num-executors 3 \
    --driver-memory 512m \
    --executor-memory 512m \
    --executor-cores 1 \
    lib/spark-examples*.jar 10