Running Spark Streaming Jobs on a Kerberos-Enabled Cluster
Use the following steps to run a Spark Streaming job on a Kerberos-enabled cluster.
- Select or create a user account to be used as principal.
This should not be the
kafka
orspark
service account. - Generate a keytab for the user.
- Create a Java Authentication and Authorization Service (JAAS) login configuration file:
for example,
key.conf
. - Add configuration settings that specify the user keytab.
The keytab and configuration files are distributed using YARN local resources. Because they reside in the current directory of the Spark YARN container, you should specify the location as
./v.keytab
.The following example specifies keytab location
./v.keytab
for principalvagrant@example.com
:KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="./v.keytab" storeKey=true useTicketCache=false serviceName="kafka" principal="vagrant@EXAMPLE.COM"; };
- In your
spark-submit
command, pass the JAAS configuration file and keytab as local resource files, using the--files
option, and specify the JAAS configuration file options to the JVM options specified for the driver and executor:spark-submit \ --files key.conf#key.conf,v.keytab#v.keytab \ --driver-java-options "-Djava.security.auth.login.config=./key.conf" \ --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./key.conf" \ ...
- Pass any relevant Kafka security options to your streaming application.
For example, the KafkaWordCount example accepts PLAINTEXTSASL as the last option in the command line:
KafkaWordCount /vagrant/spark-examples.jar c6402:2181 abc ts 1 PLAINTEXTSASL