Spark Configuration Files

Cloudera Machine Learning supports configuring Spark 2 and Spark 3 properties on a per project basis with the spark-defaults.conf file. If there is a file called spark-defaults.conf in your project root, this will be automatically be added to the global Spark defaults.

To specify an alternate file location, set the environmental variable, SPARK_CONFIG, to the path of the file relative to your project. If you’re accustomed to submitting a Spark job with key-values pairs following a --conf flag, these can also be set in a spark-defaults.conf file instead. For a list of valid key-value pairs, refer to Spark Configuration.

Administrators can set environment variable paths in the /etc/spark/conf/spark-env.sh file.