Running Apache Spark Applications
Also available as:
PDF

Submitting Spark Applications Through Livy

Livy is a Spark service that allows local and remote applications to interact with Apache Spark over an open source REST interface.

You can use Livy to submit and manage Spark jobs on a cluster. Livy extends Spark capabilities, offering additional multi-tenancy and security features. Applications can run code inside Spark without needing to maintain a local Spark context.

Features include the following:

  • Jobs can be submitted from anywhere, using the REST API.

  • Livy supports user impersonation: the Livy server submits jobs on behalf of the user who submits the requests. Multiple users can share the same server ("user impersonation" support). This is important for multi-tenant environments, and it avoids unnecessary permission escalation.

  • Livy supports security features such as Kerberos authentication and wire encryption.

    • REST APIs are backed by SPNEGO authentication, which the requested user should get authenticated by Kerberos at first.

    • RPCs between Livy Server and Remote SparkContext are encrypted with SASL.

    • The Livy server uses keytabs to authenticate itself to Kerberos.

Livy supports programmatic and interactive access to Spark with Scala:

  • Use an interactive notebook to access Spark through Livy.

  • Develop a Scala, Java, or Python client that uses the Livy API. The Livy REST API supports full Spark functionality including SparkSession, and SparkSession with Hive enabled.

  • Run an interactive session, provided by spark-shell, PySpark, or SparkR REPLs.

  • Submit batch applications to Spark.

Code runs in a Spark context, either locally or in YARN; YARN cluster mode is recommended.

To install Livy on an Ambari-managed cluster, see "Installing Spark Using Ambari" in this guide. For additional configuration steps, see "Configuring the Livy Server" in this guide.