Compiling and running a Scala-based job

You see by example how to use sbt software to compile a Scala-based Spark job.

In this task, you see how to use the following .sbt file that specifies the build configuration:
cat build.sbt
name := "Simple Project"
version := "1.0"
scalaVersion := "2.12.15"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.0"             

You also need to create a compile the following example Spark program written in Scala:

/* SimpleApp.scala */
import org.apache.spark.sql.SparkSession
 
object SimpleApp {
  def main(args: Array[String]) {
	val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your system
	val spark = SparkSession.builder.appName("Simple Application").getOrCreate()
	val logData = spark.read.textFile(logFile).cache()
	val numAs = logData.filter(line => line.contains("a")).count()
	val numBs = logData.filter(line => line.contains("b")).count()
    println(s"Lines with a: $numAs, Lines with b: $numBs")
	spark.stop()
  }
  • Install Apache Spark 2.4.x.
  • Install JDK 8.x.
  • Install Scala 2.12.
  • Install Sbt 0.13.17.
  • Wrtie an .sbt file for configuration specifications, similar to a C include file.
  • Write a Scala-based Spark program (a .scala file).
  • If the cluster is Kerberized, ensure the required security token is authorized to compile and execute the workload.
  1. Compile the code using sbt package command from the directory where the build.sbt file exists.
    For example:
    # Your directory layout should look like this
    $ find .
    .
    ./build.sbt
    ./src
    ./src/main
    ./src/main/scala
    ./src/main/scala/SimpleApp.scala
     
    # Package a jar containing your application
    $ sbt package
    ...
    [info] Packaging {..}/{..}/target/scala-2.12/simple-project_2.12-1.0.jar
    Several new files are created under new directories named project and target, including the jar file named simple-project_2.12-1.0.jar after the project name, Scala version, and code version.
  2. Execute and test the workload jar using spark submit.
    For example:
    # Use spark-submit to run your application
    spark-submit \
      --class "SimpleApp" \
      --master yarn \
      target/scala-2.12/simple-project_2.12-1.0.jar