Walkthrough of WordCount v1.0
The main method invokes the ToolRunner to run the job based on the configuration information.
The map method processes one line at a time, splitting the line on regular expression word boundaries. It emits key/value pairs in the format<word, 1>.
For File0, the map method emits these key/value pairs:
<Hadoop, 1> <is, 1> <an, 1> <elephant, 1>
For File1, the map method emits:
<Hadoop, 1> <is, 1> <as, 1> <yellow, 1> <as, 1> <can, 1> <be, 1>
For File2, the map method emits:
<Oh, 1> <what, 1> <a, 1> <yellow, 1> <fellow, 1> <is, 1> <Hadoop, 1>
The reduce method sums the number of instances for each key, and then emits them, sorted in UTF-8 alphabetical order (all the uppercase words, followed by all the lowercase words). Note that the WordCount code specifies key/value pairs. TheMapper and Reducer classes handle the rest of the processing for you.
<Hadoop, 3> <Oh, 1> <a, 1> <an, 1> <as, 2> <be, 1> <can, 1> <elephant, 1> <fellow, 1> <is, 3> <what, 1> <yellow, 2>