Spark Action Parameters - Hortonworks Data Platform

Workflow Management

Also available as:

PDF

Contents

loading table of contents...

Spark Action Parameters

Use Spark actions to handle batch and streaming data in your workflow.

Table 7.38. Spark Action, General Parameters

Parameter Name	Description	Additional Information	Example
Application Name	Name you want to assign to the Spark application.
Application	The JAR or the Python script representing the Spark application. If a JAR is specified, you must provide the Fully Qualified Main class.	Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes	Application: ${nameNode}/user/centos/spark-action/lib/oozie-examples.jar Class: org.apache.spark.myapp.MySparkApp
Runs On	YARN Cluster, YARN Client, Local, Custom	Distinguishes where the driver process runs. In cluster mode, the framework launches the driver inside of the cluster. In client mode, the submitter launches the driver outside of the cluster. In local mode, the application is run locally. Local mode is useful for debugging.	If you select YARN Cluster mode, the file must be on HDFS. For YARN Client mode, the file can be local or on HDFS. Important: The yarn-client execution mode for the Oozie Spark action has been deprecated in HDP 2.6.0 and will no longer be supported in a future release. The yarn-cluster mode for Spark action will continue to be supported.
Spark Options		See Apache Spark configuration documentation
Job XML	You can select one or more job.xml files to pass Java configuration elements.	The configuration file that specifies the variables used for the Java action in the workflow. Can be overwritten or replaced by entries under the Configuration section.

Table 7.39. Spark Action, Transition Parameters

Parameter Name	Description	Additional Information	Default Setting
Error To	Indicates what action to take if the action errors out.	You can modify this setting in the dialog box or by modifying the workflow graph.	Defaults to kill node, but can be changed.
OK To	Indicates what node to transition to if the action succeeds.	You can modify this setting in the dialog box or by modifying the workflow graph.	Defaults to the next node in the workflow.

Table 7.40. Spark Action, Advanced Properties Parameters

Parameter Name	Description	Additional Information	Example
Resource Manager	Master node that arbitrates all the available cluster resources among the competing applications.	The default setting is discovered from the cluster configuration.	${resourceManager}
Name Node	Manages the file system metadata.	Keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. Clients contact NameNode for file metadata or file modifications.	${nameNode}
File	Select any files that you want to make available to the Spark action when the workflow runs.	Specify the name of the main JAR if you did not put your JAR files in the /lib directory.	/path/file
Archive	Select any archives that you want to make available to the Spark action when the workflow runs.		/path/archived-data-files
Prepare	Select mkdir or delete and identify any HDFS paths to create or delete before starting the job.	Use delete to do file cleanup prior to job execution. Enables Oozie to retry a job if there is a transient failure (the job output directory must not exist prior to job start). If the path is to a directory: delete deletes all content recursively and then deletes the directory. mkdir creates all missing directories in the path.	${nameNode}/user/centos/output-data/spark
Arg	Identify any arguments for the Spark action.	Arguments to be passed to the main method of your main class. The value of each arg element is considered a single argument and they are passed to the main method in the same order. See Spark Applications documentation.	${nameNode}/user/username/input-data/text/data.txt ${nameNode}/user/username/output-data/spark

Table 7.41. Spark Action, Configuration Parameters

Parameter Name	Description	Additional Information	Example
Name and Value	The name/value pair can be used instead of a job.xml file or can override parameters set in the job.xml file.	Used to specify formal parameters. If the name and value are specified, the user can override the values from the Submit dialog box. Can be parameterized (templatized) using EL expressions.