Configuring authentication for long-running Spark Streaming jobs
Long-running applications such as Spark Streaming
jobs must be able to write data continuously, which means that the user
may need to delegate tokens possibly beyond the default lifetime. This
workload type requires passing Kerberos principal and keytab to the
spark-submit
script using the
--principal
and --keytab
parameters.
The keytab is copied to the host running the ApplicationMaster, and the
Kerberos login is renewed periodically by using the principal and keytab
to generate the required delegation tokens needed for HDFS.