Building and Running a Secure Spark Streaming Job
Use the following steps to build and run a secure Spark streaming job.
Depending on your compilation and build processes, one or more of the following tasks might be required before running a Spark Streaming job:
-
If you are using
maven
as a compile tool:- Add the Hortonworks repository to your
pom.xml
file:<repository> <id>hortonworks</id> <name>hortonworks repo</name> <url>http://repo.hortonworks.com/content/repositories/releases/</url> </repository>
- Specify the Hortonworks version number for Spark streaming Kafka and
streaming dependencies to your
pom.xml
file:<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-kafka_2.10</artifactId> <version>2.0.0.2.4.2.0-90</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.10</artifactId> <version>2.0.0.2.4.2.0-90</version> <scope>provided</scope> </dependency>
Note that the correct version number includes the Spark version and the HDP version.
- (Optional) If you prefer to pack an uber .jar rather than use the
default ("provided"), add the
maven-shade-plugin
to yourpom.xml
file:<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.3</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> </execution> </executions> <configuration> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> <finalName>uber-${project.artifactId}-${project.version}</finalName> </configuration> </plugin>
- Add the Hortonworks repository to your
-
Instructions for submitting your job depend on whether you used an uber .jar file or not:
-
If you kept the default .jar scope and you can access an external network, use
--packages
to download dependencies in the runtime library:spark-submit --master yarn-client \ --num-executors 1 \ --packages org.apache.spark:spark-streaming-kafka_2.10:2.0.0.2.4.2.0-90 \ --repositories http://repo.hortonworks.com/content/repositories/releases/ \ --class <user-main-class> \ <user-application.jar> \ <user arg lists>
The artifact and repository locations should be the same as specified in your
pom.xml
file. -
If you packed the .jar file into an uber .jar, submit the .jar file in the same way as you would a regular Spark application:
spark-submit --master yarn-client \ --num-executors 1 \ --class <user-main-class> \ <user-uber-application.jar> \ <user arg lists>
-
For a sample pom.xml
file, see "Sample pom.xml file for Spark
Streaming with Kafka" in this guide.