Spark Log4j Configuration

Cloudera Machine Learning allows you to update Spark’s internal logging configuration on a per-project basis. Spark logging properties can be customized for every session and job with a default file path found at the root of your project. You can also specify a custom location with a custom environment variable.

Spark 2 and Spark 3 up to Spark 3.2 (Log4j)

Spark 2 and Spark 3 (up to Spark 3.2) use Apache Log4j. By default, if a log4j.properties file is found in the root of your project, its content will be appended for every session and job to the default Spark logging properties, located at /etc/spark/conf/log4j.properties. To specify a custom location, set the environmental variable LOG4J_CONFIG to the file location relative to your project.

Increasing the log level or pushing logs to an alternate location for troublesome jobs can be very helpful for debugging.

For example, a log4j.properties file in the root of a project that sets the logging level to INFO for Spark jobs can be as follows:
shell.log.level=INFO
PySpark logging levels can be set as follows:
log4j.logger.org.apache.spark.api.python.PythonGatewayServer=[***LOG LEVEL***]
Scala logging levels can be set as follows:
log4j.logger.org.apache.spark.repl.Main=[***LOG LEVEL***]

Spark 3.3 and above (Log4j2)

Spark 3.3+ uses Apache Log4j2. By default, if a log4j2.properties file is found in the root of your project, its content will be appended for every session and job to the default Spark logging properties located at /etc/spark/conf/log4j2.properties. To specify a custom location, set the environmental variable LOG4J2_CONFIG to the file location relative to your project.

For more information on logging options, see the Log4j2 documentation.