Running Apache Spark Applications
Also available as:
PDF

Submitting Batch Applications Using the Livy API

Spark provides a spark-submit command for submitting batch applications. Livy provides equivalent functionality through REST APIs, using job specifications specified in a JSON document.

The following example shows a spark-submit command that submits a SparkPi job, followed by an example that uses Livy POST requests to submit the job. The remainder of this subsection describes Livy objects and REST API syntax. For additional examples and information, see the readme.rst file at https://github.com/hortonworks/livy-release/releases/tag/HDP-2.6.0.3-8-tag.

The following command uses spark-submit to submit a SparkPi job:

./bin/spark-submit \
    --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --executor-memory 20G \
    /path/to/examples.jar 1000

To submit the SparkPi job using Livy, complete the following steps. Note: the POST request does not upload local jars to the cluster. You should upload required jar files to HDFS before running the job. This is the main difference between the Livy API and spark-submit.

  1. Form a JSON structure with the required job parameters:
    { "className": "org.apache.spark.examples.SparkPi", 
            "executorMemory": "20g", 
            "args": [2000], 
            "file": "/path/to/examples.jar"
            }
  2. Specify master and deploy mode in the livy.conf file.
  3. To submit the SparkPi application to the Livy server, use the a POST /batches request.
  4. The Livy server helps launch the application in the cluster.