Spark Log4j configuration
Cloudera Machine Learning allows you to update Spark’s internal logging configuration on a per-project basis. Spark logging properties can be customized for every session and job with a default file path found at the root of your project. You can also specify a custom location with a custom environment variable.
Spark 2 and Spark 3 up to Spark 3.2 (Log4j)
Spark 2 and Spark 3 (up to Spark 3.2) use Apache Log4j. By default, if a
log4j.properties
file is found in the root of your project, its content will
be appended for every session and job to the default Spark logging properties, located at
/etc/spark/conf/log4j.properties. To specify a custom location, set the
environmental variable LOG4J_CONFIG
to the file location relative to your
project.
Increasing the log level or pushing logs to an alternate location for troublesome jobs can be very helpful for debugging.
log4j.properties
file in the root of a project that sets the
logging level to INFO for Spark jobs can be as follows:
shell.log.level=INFO
log4j.logger.org.apache.spark.api.python.PythonGatewayServer=[***LOG LEVEL***]
log4j.logger.org.apache.spark.repl.Main=[***LOG LEVEL***]
Spark 3.3 and above (Log4j2)
Spark 3.3+ uses Apache Log4j2. By default, if a log4j2.properties
file is
found in the root of your project, its content will be appended for every session and job to the
default Spark logging properties located at
/etc/spark/conf/log4j2.properties. To specify a custom location, set the
environmental variable LOG4J2_CONFIG
to the file location relative to your
project.
For more information on logging options, see the Log4j2 documentation.