Configure the Spark Thrift server
Use the following steps to configure the Apache Spark Thrift server on a Kerberos-enabled cluster.
If you are installing the Spark Thrift server on a Kerberos-enabled cluster, note the following requirements:
-
The Spark Thrift server must run in the same host as
HiveServer2
, so that it can access thehiveserver2
keytab. -
Permissions in
/var/run/spark
and/var/log/spark
must specify read/write permissions to the Hive service account. -
You must use the Hive service account to start the
thriftserver
process.
If you access Hive warehouse files through HiveServer2 on a deployment with fine-grained
access control, run the Spark Thrift server as user hive
. This ensures that
the Spark Thrift server can access Hive keytabs, the Hive metastore, and HDFS data stored
under user hive
.
Important | |
---|---|
If you read files from HDFS directly through an interface such as the Spark CLI (as
opposed to HiveServer2 with fine-grained access control), you should use a different
service account for the Spark Thrift server. Configure the account so that it can access
Hive keytabs and the Hive metastore. Use of an alternate account provides a more secure
configuration: when the Spark Thrift server runs queries as user |
For Spark jobs that are not submitted through the Thrift server, the user submitting the
job must have access to the Hive metastore in secure mode, using the kinit
command.