Managing dependencies for Spark 2 and Scala

This topic demonstrates how to manage dependencies on local and external files or packages.

Example: Read files from the cluster local filesystem

Use the following command in the terminal to read text from the local filesystem. The file must exist on all hosts, and the same path for the driver and executors. In this example you are reading the file ebay-xbox.csv.

sc.textFile(“file:///tmp/ebay-xbox.csv”) 

Adding remote packages

External libraries are handled through line magics. Line magics in the Toree kernel are prefixed with %. You can use Apache Toree's AddDeps magic to add dependencies from Maven central. You must specify the company name, artifact ID, and version. To resolve any transitive dependencies, you must explicitly specify the --transitive flag.

%AddDeps org.scalaj scalaj-http_2.11 2.3.0
import scalaj.http._ 
val response: HttpResponse[String] = Http("http://www.omdbapi.com/").param("t","crimson tide").asString
response.body
response.code
response.headers
response.cookies

Adding remote or local Jars

You can use the AddJars magic to distribute local or remote JARs to the kernel and the cluster. Using the -f option ignores cached JARs and reloads.

%AddJar http://example.com/some_lib.jar -f 
%AddJar file:/path/to/some/lib.jar