Job Configuration

Job represents a MapReduce job configuration. Job is the primary interface to describe a MapReduce job to the Hadoop framework for execution. The framework tries to execute the job as described by Job. However:

Some configuration parameters might be marked as final by administrators (see Configuration) and cannot be altered.
Although some job parameters are straightforward to set (for example, Job.setNumReduceTasks(int)), other parameters interact subtly with the rest of the framework or job configuration and are more complex to set (for example, JobContext.NUM_MAPS(int)).

Job is typically used to specify the Mapper, Combiner (if any), Partitioner, Reducer, InputFormat, and OutputFormat implementations. FileInputFormat is used to specify the set of input files (for example, FileInputFormat.setInputPaths(Job, Path...)). FileOutputFormat is used to specify where the output files should be written (for example, FileOutputFormat.setOutputPath(Path)).

Optionally, Job is used to specify other facets of the job such as:

The Comparator to use.
Files to be put in the DistributedCache.
Whether intermediate or job outputs are to be compressed (and how).
Whether job tasks can be executed in a speculative manner.
Maximum number of attempts per task.

You can use the Configuration methods get(String, String) and set(String, String) to get and set arbitrary parameters for applications. However, use DistributedCache for large amounts of read-only data.

Categories: Hadoop | Job configuration | MapReduce | User interfaces | All Categories

Essential Interfaces

Task Execution and Environment