Spark Log4j Configuration
Cloudera Machine Learning allows you to update Spark’s internal logging configuration on a per-project basis.
Spark 2 uses
Apache Log4j, which can be configured through a properties file. By
default, a log4j.properties
file found in the root of
your project will be appended to the existing Spark logging properties
for every session and job. To specify a custom location, set the
environmental variable LOG4J_CONFIG
to the file
location relative to your project.
The Log4j documentation has more details on logging options.
Increasing the log level or pushing logs to an alternate location for
troublesome jobs can be very helpful for debugging. For example, this is
a
log4j.properties
file in the root of a project that
sets the logging level to INFO for Spark jobs.
shell.log.level=INFO
PySpark logging levels should be set as
follows:
log4j.logger.org.apache.spark.api.python.PythonGatewayServer=<LOG_LEVEL>
And Scala logging levels should be set
as:
log4j.logger.org.apache.spark.repl.Main=<LOG_LEVEL>