This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

MapReduce

The Avro MapReduce API is an Avro module for running MapReduce programs which produce or consume Avro data files.

If you are using Maven, simply add the following dependency to your POM:

<dependency>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro-mapred</artifactId>
    <version>1.7.3</version>
    <classifier>hadoop2</classifier>
</dependency>

Then write your program using the Avro MapReduce javadoc for guidance.

At runtime, include the avro and avro-mapred JARs in the HADOOP_CLASSPATH; and the avro, avro-mapred and paranamer JARs in -libjars.

To enable Snappy compression on output files call AvroJob.setOutputCodec(job, "snappy") when configuring the job. You will also need to include the snappy-java JAR in -libjars.

Page generated September 3, 2015.