Tips and Best Practices for Jobs
This section describes changes you can make at the job level.
Use the Distributed Cache to Transfer the Job JAR
Use the distributed cache to transfer the job JAR rather than using the
JobConf(Class) constructor and the
To add JARs to the classpath, use
jar1,jar2. This copies the local JAR files
to HDFS and uses the distributed cache mechanism to ensure they are available on the task
nodes and added to the task classpath.
The advantage of this, over
JobConf.setJar, is that if the JAR is on a task
node, it does not need to be copied again if a second task from the same job runs on that
node, though it will still need to be copied from the launch machine to HDFS.
For more information, see item 1 in the blog post How to Include Third-Party Libraries in Your MapReduce Job.
Changing the Logging Level on a Job (MRv1)
You can change the logging level for an individual job. You do this by setting the following
properties in the job configuration (
Valid values are
JobConf conf = new JobConf(); ... conf.set("mapreduce.map.log.level", "DEBUG"); conf.set("mapreduce.reduce.log.level", "TRACE"); ...