Developing Apache Spark Applications
Also available as:
PDF

Building and Running a Secure Spark Streaming Job

Use the following steps to build and run a secure Spark streaming job.

Depending on your compilation and build processes, one or more of the following tasks might be required before running a Spark Streaming job:

  • If you are using maven as a compile tool:

    1. Add the Hortonworks repository to your pom.xml file:
      <repository>
          <id>hortonworks</id>
          <name>hortonworks repo</name>
          <url>http://repo.hortonworks.com/content/repositories/releases/</url>
      </repository>
    2. Specify the Hortonworks version number for Spark streaming Kafka and streaming dependencies to your pom.xml file:
      <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-streaming-kafka_2.10</artifactId>
          <version>2.0.0.2.4.2.0-90</version>
      </dependency>
      
      <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-streaming_2.10</artifactId>
          <version>2.0.0.2.4.2.0-90</version>
          <scope>provided</scope>
      </dependency>

      Note that the correct version number includes the Spark version and the HDP version.

    3. (Optional) If you prefer to pack an uber .jar rather than use the default ("provided"), add the maven-shade-plugin to your pom.xml file:
      <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-shade-plugin</artifactId>
          <version>2.3</version>
          <executions>
              <execution>
                  <phase>package</phase>
                  <goals>
                      <goal>shade</goal>
                  </goals>
              </execution>
          </executions>
          <configuration>
              <filters>
                  <filter>
                      <artifact>*:*</artifact>
                      <excludes>
                          <exclude>META-INF/*.SF</exclude>
                          <exclude>META-INF/*.DSA</exclude>
                          <exclude>META-INF/*.RSA</exclude>
                      </excludes>
                  </filter>
              </filters>
              <finalName>uber-${project.artifactId}-${project.version}</finalName>
          </configuration>
      </plugin>
  • Instructions for submitting your job depend on whether you used an uber .jar file or not:

    • If you kept the default .jar scope and you can access an external network, use --packages to download dependencies in the runtime library:

      spark-submit --master yarn-client \
          --num-executors 1 \
          --packages org.apache.spark:spark-streaming-kafka_2.10:2.0.0.2.4.2.0-90 \
          --repositories http://repo.hortonworks.com/content/repositories/releases/ \
          --class <user-main-class> \
          <user-application.jar> \
          <user arg lists>

      The artifact and repository locations should be the same as specified in your pom.xml file.

    • If you packed the .jar file into an uber .jar, submit the .jar file in the same way as you would a regular Spark application:

      spark-submit --master yarn-client \
          --num-executors 1 \
          --class <user-main-class> \
          <user-uber-application.jar> \
          <user arg lists>

For a sample pom.xml file, see "Sample pom.xml file for Spark Streaming with Kafka" in this guide.