Developing Apache Spark Applications
Also available as:
PDF

Running Spark Streaming Jobs on a Kerberos-Enabled Cluster

Use the following steps to run a Spark Streaming job on a Kerberos-enabled cluster.

  1. Select or create a user account to be used as principal.

    This should not be the kafka or spark service account.

  2. Generate a keytab for the user.
  3. Create a Java Authentication and Authorization Service (JAAS) login configuration file: for example, key.conf.
  4. Add configuration settings that specify the user keytab.

    The keytab and configuration files are distributed using YARN local resources. Because they reside in the current directory of the Spark YARN container, you should specify the location as ./v.keytab.

    The following example specifies keytab location ./v.keytab for principal vagrant@example.com:

    KafkaClient {
       com.sun.security.auth.module.Krb5LoginModule required
       useKeyTab=true
       keyTab="./v.keytab"
       storeKey=true
       useTicketCache=false
       serviceName="kafka"
       principal="vagrant@EXAMPLE.COM";
    };
  5. In your spark-submit command, pass the JAAS configuration file and keytab as local resource files, using the --files option, and specify the JAAS configuration file options to the JVM options specified for the driver and executor:
    spark-submit \
        --files key.conf#key.conf,v.keytab#v.keytab \
        --driver-java-options "-Djava.security.auth.login.config=./key.conf" \
        --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./key.conf" \
    ...
  6. Pass any relevant Kafka security options to your streaming application.

    For example, the KafkaWordCount example accepts PLAINTEXTSASL as the last option in the command line:

    KafkaWordCount /vagrant/spark-examples.jar c6402:2181 abc ts 1 PLAINTEXTSASL