Walkthrough of WordCount v1.0

The main method invokes the ToolRunner to run the job based on the configuration information.

The map method processes one line at a time, splitting the line on regular expression word boundaries. It emits key/value pairs in the format<word, 1>.

For File0, the map method emits these key/value pairs:

<Hadoop, 1>
<is, 1>
<an, 1>
<elephant, 1>

For File1, the map method emits:

<Hadoop, 1>
<is, 1>
<as, 1>
<yellow, 1>
<as, 1>
<can, 1>
<be, 1>

For File2, the map method emits:

<Oh, 1>
<what, 1>
<a, 1>
<yellow, 1>
<fellow, 1>
<is, 1>
<Hadoop, 1>

The reduce method sums the number of instances for each key, and then emits them, sorted in UTF-8 alphabetical order (all the uppercase words, followed by all the lowercase words). Note that the WordCount code specifies key/value pairs. TheMapper and Reducer classes handle the rest of the processing for you.

<Hadoop, 3>
<Oh, 1>
<a, 1>
<an, 1>
<as, 2>
<be, 1>
<can, 1>
<elephant, 1>
<fellow, 1>
<is, 3>
<what, 1>
<yellow, 2>

Running WordCount v1.0

Example: WordCount v2.0