Configuring Apache Spark
Also available as:
PDF

Configure the Spark Thrift server

Use the following steps to configure the Apache Spark Thrift server on a Kerberos-enabled cluster.

If you are installing the Spark Thrift server on a Kerberos-enabled cluster, note the following requirements:

  • The Spark Thrift server must run in the same host as HiveServer2, so that it can access the hiveserver2keytab.

  • Permissions in /var/run/spark and /var/log/spark must specify read/write permissions to the Hive service account.

  • You must use the Hive service account to start the thriftserver process.

If you access Hive warehouse files through HiveServer2 on a deployment with fine-grained access control, run the Spark Thrift server as user hive. This ensures that the Spark Thrift server can access Hive keytabs, the Hive metastore, and HDFS data stored under user hive.

Important
Important

If you read files from HDFS directly through an interface such as the Spark CLI (as opposed to HiveServer2 with fine-grained access control), you should use a different service account for the Spark Thrift server. Configure the account so that it can access Hive keytabs and the Hive metastore. Use of an alternate account provides a more secure configuration: when the Spark Thrift server runs queries as user hive, all data accessible to user hive is accessible to the user submitting the query.

For Spark jobs that are not submitted through the Thrift server, the user submitting the job must have access to the Hive metastore in secure mode, using the kinit command.