Submitting Spark Applications Through Livy
Livy is a Spark service that allows local and remote applications to interact with Apache Spark over an open source REST interface.
You can use Livy to submit and manage Spark jobs on a cluster. Livy extends Spark capabilities, offering additional multi-tenancy and security features. Applications can run code inside Spark without needing to maintain a local Spark context.
Features include the following:
-
Jobs can be submitted from anywhere, using the REST API.
-
Livy supports user impersonation: the Livy server submits jobs on behalf of the user who submits the requests. Multiple users can share the same server ("user impersonation" support). This is important for multi-tenant environments, and it avoids unnecessary permission escalation.
-
Livy supports security features such as Kerberos authentication and wire encryption.
-
REST APIs are backed by SPNEGO authentication, which the requested user should get authenticated by Kerberos at first.
-
RPCs between Livy Server and Remote SparkContext are encrypted with SASL.
-
The Livy server uses keytabs to authenticate itself to Kerberos.
-
Livy supports programmatic and interactive access to Spark with Scala:
-
Use an interactive notebook to access Spark through Livy.
-
Develop a Scala, Java, or Python client that uses the Livy API. The Livy REST API supports full Spark functionality including SparkSession, and SparkSession with Hive enabled.
-
Run an interactive session, provided by spark-shell, PySpark, or SparkR REPLs.
-
Submit batch applications to Spark.
Code runs in a Spark context, either locally or in YARN; YARN cluster mode is recommended.
To install Livy on an Ambari-managed cluster, see "Installing Spark Using Ambari" in this guide. For additional configuration steps, see "Configuring the Livy Server" in this guide.