Using the %livy Interpreter to Access Spark

Using the `%livy` Interpreter to Access Spark

The Livy interpreter offers several advantages over the default Spark interpreter (%spark):

Sharing of Spark context across multiple Zeppelin instances.
Reduced resource use, by recycling resources after 60 minutes of activity (by default). The default Spark interpreter runs jobs--and retains job resources--indefinitely.
User impersonation. When the Zeppelin server runs with authentication enabled, the Livy interpreter propagates user identity to the Spark job so that the job runs as the originating user. This is especially useful when multiple users are expected to connect to the same set of data repositories within an enterprise. (The default Spark interpreter runs jobs as the default Zeppelin user.)
The ability to run Spark in yarn-cluster mode.

Prerequisite: Before using Livy in a note, check the Interpreter page to ensure that the Livy interpreter is configured properly for your cluster:

Note: The Interpreter page is subject to access control settings. If the Interpreters page does not list settings, check with your system administrator for more information.

To use access Spark using Livy, specify the corresponding interpreter directive in a paragraph, before the code that accesses Spark; for example:

PySpark:
```
%livy.pyspark
print "1"
```

SparkR:

%livy.sparkr
hello <- function( name ) {
    sprintf( "Hello, %s", name );
}

hello("livy")

	Important
To use SQLContext with Livy, do not create SQLContext explicitly. Zeppelin creates SQLContext by default. If necessary, remove the following lines from the SparkSQL declaration area of your note: //val sqlContext = new org.apache.spark.sql.SQLContext(sc) //import sqlContext.implicits._

Important

To use SQLContext with Livy, do not create SQLContext explicitly. Zeppelin creates SQLContext by default.

If necessary, remove the following lines from the SparkSQL declaration area of your note:

//val sqlContext = new org.apache.spark.sql.SQLContext(sc)
//import sqlContext.implicits._

Livy sessions are recycled after a specified period of session inactivity. The default is one hour. If your session times out, you will need to restart the interpreter. On the Interpreters configuration page, locate the Livy interpreter and click "restart".

Importing External Packages

To import an external package for use in a note that runs with Livy:

Navigate to the interpreter settings.
If you are running the Livy interpreter in local mode (as specified by livy.spark.master), add jar files to the /usr/hdp/<version>/livy/repl-jars directory.
If you are running the Livy interepreter in yarn-cluster mode, either complete step 2 or edit the Livy configuration on the Interpreters page as follows:
1. Add a new key, livy.spark.jars.packages.
2. Set its value to <group>:<id>:<version>.
  Here is an example for the spray-json library, which implements JSON in Scala:
  io.spray:spray-json_2.10:1.3.1

​Using the %livy Interpreter to Access Spark

Using the `%livy` Interpreter to Access Spark