Spark Configuration Files

Cloudera Data Science Workbench supports configuring Spark 2 properties on a per project basis with the spark-defaults.conf file.

If there is a file called spark-defaults.conf in your project root, this will be automatically be added to the global Spark defaults. To specify an alternate file location, set the environmental variable, SPARK_CONFIG, to the path of the file relative to your project. If you’re accustomed to submitting a Spark job with key-values pairs following a --conf flag, these can also be set in a spark-defaults.conf file instead. For a list of valid key-value pairs, refer the Spark configuration reference documentation.

Administrators can set environment variable paths in the /etc/spark2/conf/spark-env.sh file.

You can also use Cloudera Manager to configure spark-defaults.conf and spark-env.sh globally for all Spark applications as follows.