Tips and Best Practices for Jobs
This section describes changes you can make at the job level.
Use the Distributed Cache to Transfer the Job JAR
Use the distributed cache to transfer the job JAR rather than using the
JobConf(Class)
constructor and the JobConf.setJar()
and
JobConf.setJarByClass()
methods.
To add JARs to the classpath, use -libjars
jar1,jar2
. This copies the local JAR files
to HDFS and uses the distributed cache mechanism to ensure they are available on the task
nodes and added to the task classpath.
The advantage of this, over JobConf.setJar
, is that if the JAR is on a task
node, it does not need to be copied again if a second task from the same job runs on that
node, though it will still need to be copied from the launch machine to HDFS.
For more information, see item 1 in the blog post How to Include Third-Party Libraries in Your MapReduce Job.
Changing the Logging Level on a Job (MRv1)
You can change the logging level for an individual job. You do this by setting the following properties in the job configuration:
-
mapreduce.map.log.level
-
mapreduce.reduce.log.level
Valid values are NONE
, INFO
, WARN
,
DEBUG
, TRACE
, and ALL
.
Example:
JobConf conf = new JobConf(); ... conf.set("mapreduce.map.log.level", "DEBUG"); conf.set("mapreduce.reduce.log.level", "TRACE"); ...