Configuring Spark Applications
You can specify Spark application configuration properties as follows:
- Pass properties using the --conf command-line switch; for example:
spark-submit \ --class com.cloudera.example.YarnExample \ --master yarn \ --deploy-mode cluster \ --conf "spark.eventLog.dir=hdfs:///user/spark/eventlog" \ lib/yarn-example.jar \ 10
- Specify properties in spark-defaults.conf. See Configuring Spark Application Properties in spark-defaults.conf.
-
Pass properties directly to the SparkConf used to create the SparkContext in your Spark application; for example:
-
Scala:
val conf = new SparkConf().set("spark.dynamicAllocation.initialExecutors", "5") val sc = new SparkContext(conf)
-
Python:
from pyspark import SparkConf, SparkContext from pyspark.sql import SQLContext conf = (SparkConf().setAppName('Application name')) conf.set('spark.hadoop.avro.mapred.ignore.inputs.without.extension', 'false') sc = SparkContext(conf = conf) sqlContext = SQLContext(sc)
-
The order of precedence in configuration properties is:
- Properties passed to SparkConf.
- Arguments passed to spark-submit, spark-shell, or pyspark.
- Properties set in spark-defaults.conf.
For more information, see Spark Configuration.
Configuring Spark Application Properties in spark-defaults.conf
Specify properties in the spark-defaults.conf file in the form property value.
To create a comment, add a hash mark ( # ) at the beginning of a line. You cannot add comments to the end or middle of a line.
This example shows a spark-defaults.conf file:
spark.master spark://mysparkmaster.acme.com:7077 spark.eventLog.enabled true spark.eventLog.dir hdfs:///user/spark/eventlog # Set spark executor memory spark.executor.memory 2g spark.logConf true
Cloudera recommends placing configuration properties that you want to use for every application in spark-defaults.conf. See Application Properties for more information.
Configuring Properties in spark-defaults.conf Using Cloudera Manager
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
Configure properties for all Spark applications in spark-defaults.conf as follows:
- Go to the Spark service.
- Click the Configuration tab.
- Select .
- Select .
- Locate the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf property.
-
Specify properties described in Application Properties.
If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
- Enter a Reason for change, and then click Save Changes to commit the changes.
- Deploy the client configuration.
Configuring Spark Application Logging Properties
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
To configure only the logging threshold level, follow the procedure in Configuring Logging Thresholds. To configure any other logging property, do the following:
- Go to the Spark service.
- Click the Configuration tab.
- Select .
- Select .
- Locate the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/log4j.properties property.
-
Specify log4j properties.
If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager .
- Enter a Reason for change, and then click Save Changes to commit the changes.
- Deploy the client configuration.