This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

Streaming

To read from Avro data files from a streaming program, specify org.apache.avro.mapred.AvroAsTextInputFormat as the input format. This input format will convert each datum in the Avro data file to a string. For a "bytes" schema, this will be the raw bytes, while in the general case it will be a single-line JSON representation of the datum.

To write to Avro data files from a streaming program, specify org.apache.avro.mapred.AvroTextOutputFormat as the output format. This output format will create Avro data files with a "bytes" schema, where each datum is a tab-delimited key-value pair.

At runtime specify the avro, avro-mapred and paranamer JARs in -libjars in the streaming command.

To enable Snappy compression on output files, set the property avro.output.codec to snappy. You will also need to include the snappy-java JAR in -libjars.

Page generated September 3, 2015.