Non-Ambari Cluster Installation Guide
Also available as:
PDF
loading table of contents...

(Optional) Starting the Spark Thrift Server

[Note]Note

The Spark Thrift Server automatically uses dynamic resource allocation. If you use this Spark application, you do not need to set up dynamic resource allocation.

To enable and start the Spark Thrift Server:

  1. From SPARK_HOME, start the Spark SQL Thrift Server. Specify the port value of the Thrift Server (the default is 10015). For example:

    su spark

    ./sbin/start-thriftserver.sh --master yarn-client --executor-memory 512m --hiveconf hive.server2.thrift.port=100015

  2. Use this port when you connect via Beeline.

Kerberos Considerations

If you are installing the Spark Thrift Server on a Kerberos-secured cluster, the following instructions apply:

  • The Spark Thrift Server must run in the same host as HiveServer2, so that it can access the hiveserver2 keytab.

  • Edit permissions in /var/run/spark and /var/log/spark to specify read/write permissions to the Hive service account.

  • Use the Hive service account to start the thriftserver process.

[Note]Note

We recommend that you run the Spark Thrift Server as user hive instead of user spark (this supersedes recommendations in previous releases). This ensures that the Spark Thrift Server can access Hive keytabs, the Hive metastore, and data in HDFS that is stored under user hive.

[Important]Important

When the Spark Thrift Server runs queries as user hive, all data accessible to user hive will be accessible to the user submitting the query. For a more secure configuration, use a different service account for the Spark Thrift Server. Provide appropriate access to the Hive keytabs and the Hive metastore.

For Spark jobs that are not submitted through the Thrift Server, the user submitting the job must have access to the Hive metastore in secure mode (via kinit).