Managing Dependencies for Spark 2 and Scala
This topic demonstrates how to manage dependencies on local and external files or packages.
Example: Read Files from the Cluster Local Filesystem
Use the following command in the terminal to read text from the local filesystem. The file must exist on all hosts, and the same path for the driver and executors. In this example you are reading the file ebay-xbox.csv.
sc.textFile(“file:///tmp/ebay-xbox.csv”)
Adding Remote Packages
External
libraries are handled through line magics. Line magics in the Toree
kernel are prefixed with %. You can use Apache Toree's
AddDeps
magic to add dependencies from Maven central.
You must specify the company name, artifact ID, and version. To resolve
any transitive dependencies, you must explicitly specify the
--transitive
flag.
%AddDeps org.scalaj scalaj-http_2.11 2.3.0
import scalaj.http._
val response: HttpResponse[String] = Http("http://www.omdbapi.com/").param("t","crimson tide").asString
response.body
response.code
response.headers
response.cookies
Adding Remote or Local Jars
You can use the AddJars
magic to distribute
local or remote JARs to the kernel and the cluster. Using the -f option ignores
cached JARs and reloads.
%AddJar http://example.com/some_lib.jar -f
%AddJar file:/path/to/some/lib.jar