Spark Guide
Also available as:
PDF

Chapter 7. Adding Libraries to Spark

To use a custom library with a Spark application (a library that is not available in Spark by default, such as a compression library or Magellan), use one of the following two spark-submit script options:

  • The --jars option transfers associated jar files to the cluster.

  • The --packages option pulls directly from Spark packages. This approach requires an internet connection.

For example, to add the LZO compression library to Spark using the --jars option:

spark-submit --driver-memory 1G --executor-memory 1G --master yarn-client --jars /usr/hdp/2.3.0.0-2557/hadoop/lib/hadoop-lzo-0.6.0.2.3.0.0-2557.jar test_read_write.py

For more information about the two options, see Advanced Dependency Management in the Apache Spark "Submitting Applications" document.