Developing Apache Spark ApplicationsPDF version

Using custom libraries with Spark

Spark comes equipped with a selection of libraries, including Spark SQL, Spark Streaming, and MLlib.

If you want to use a custom library, such as a compression library or Magellan, you can use one of the following two spark-submit script options:

  • The --jars option, which transfers associated .jar files to the cluster. Specify a list of comma-separated .jar files.

  • The --packages option, which pulls files directly from Spark packages. This approach requires an internet connection.

For example, you can use the --jars option to add codec files. The following example adds the LZO compression library:

spark-submit --driver-memory 1G \
    --executor-memory 1G \
    --master yarn-client \
    --jars /usr/hdp/2.6.0.3-8/hadoop/lib/hadoop-lzo-0.6.0.2.6.0.3-8.jar \
    test_read_write.py

For more information about the two options, see Advanced Dependency Management on the Apache Spark "Submitting Applications" web page.

We want your opinion

How can we improve this page?

What kind of feedback do you have?