Configuring Spark Applications

You can specify Spark application configuration properties as follows:

  • Pass properties using the --conf command-line switch; for example:
    spark-submit \
    --class com.cloudera.example.YarnExample \
    --master yarn \
    --deploy-mode cluster \
    --conf "spark.eventLog.dir=hdfs:///user/spark/eventlog" \
    lib/yarn-example.jar \
    10
    
  • Specify properties in spark-defaults.conf. See Configuring Spark Application Properties in spark-defaults.conf.
  • Pass properties directly to the SparkConf used to create the SparkContext in your Spark application; for example:

    • Scala:

      val conf = new SparkConf().set("spark.dynamicAllocation.initialExecutors", "5")
      val sc = new SparkContext(conf)
      
    • Python:

      from pyspark import SparkConf, SparkContext
      from pyspark.sql import SQLContext
      conf = (SparkConf().setAppName('Application name'))
      conf.set('spark.hadoop.avro.mapred.ignore.inputs.without.extension', 'false')
      sc = SparkContext(conf = conf)
      sqlContext = SQLContext(sc)
      

The order of precedence in configuration properties is:

  1. Properties passed to SparkConf.
  2. Arguments passed to spark-submit, spark-shell, or pyspark.
  3. Properties set in spark-defaults.conf.

For more information, see Spark Configuration.

Configuring Spark Application Properties in spark-defaults.conf

Specify properties in the spark-defaults.conf file in the form property value.

To create a comment, add a hash mark ( # ) at the beginning of a line. You cannot add comments to the end or middle of a line.

This example shows a spark-defaults.conf file:

spark.master     spark://mysparkmaster.acme.com:7077
spark.eventLog.enabled    true
spark.eventLog.dir        hdfs:///user/spark/eventlog
# Set spark executor memory
spark.executor.memory     2g
spark.logConf             true

Cloudera recommends placing configuration properties that you want to use for every application in spark-defaults.conf. See Application Properties for more information.

Configuring Properties in spark-defaults.conf Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

Configure properties for all Spark applications in spark-defaults.conf as follows:

  1. Go to the Spark service.
  2. Click the Configuration tab.
  3. Select Scope > Gateway.
  4. Select Category > Advanced.
  5. Locate the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf property.
  6. Specify properties described in Application Properties.

    If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

  7. Enter a Reason for change, and then click Save Changes to commit the changes.
  8. Deploy the client configuration.

Configuring Spark Application Logging Properties

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

To configure only the logging threshold level, follow the procedure in Configuring Logging Thresholds. To configure any other logging property, do the following:

  1. Go to the Spark service.
  2. Click the Configuration tab.
  3. Select Scope > Gateway.
  4. Select Category > Advanced.
  5. Locate the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/log4j.properties property.
  6. Specify log4j properties.

    If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager .

  7. Enter a Reason for change, and then click Save Changes to commit the changes.
  8. Deploy the client configuration.