Configuring Spark Applications
You can specify Spark application configuration properties as follows:
-
Pass properties using the
--confcommand-line switch; for example:spark-submit \ --class com.cloudera.example.YarnExample \ --master yarn \ --deploy-mode cluster \ --conf "spark.eventLog.dir=hdfs:///user/spark/eventlog" \ lib/yarn-example.jar \ 10 - Specify properties in spark-defaults.conf.
-
Pass properties directly to the
SparkConfused to create theSparkContextin your Spark application; for example:-
Scala:
val conf = new SparkConf().set("spark.dynamicAllocation.initialExecutors", "5") val sc = new SparkContext(conf) -
Python:
from pyspark import SparkConf, SparkContext from pyspark.sql import SQLContext conf = (SparkConf().setAppName('Application name')) conf.set('spark.hadoop.avro.mapred.ignore.inputs.without.extension', 'false') sc = SparkContext(conf = conf) sqlContext = SQLContext(sc)
-
The order of precedence in configuration properties is:
-
Properties passed to
SparkConf. -
Arguments passed to
spark-submit,spark-shell, orpyspark. -
Properties set in
spark-defaults.conf.
For more information, see Spark Configuration.
