This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

Flume

The HDFSEventSink that is used to serialize event data onto HDFS supports plugin implementations of EventSerializer interface. Implementations of this interface have full control over the serialization format and can be used in cases where the default serialization format provided by the Sink does not suffice.

An abstract implementation of the EventSerializer interface is provided along with Flume, called the AbstractAvroEventSerializer. This class can be extended to support custom schema for Avro serialization over HDFS. A simple implementation that maps the events to a representation of String header map and byte payload in Avro is provided by the class FlumeEventAvroEventSerializer which can be used by setting the serializer property of the Sink as follows:

<agent-name>.sinks.<sink-name>.serializer = AVRO_EVENT

Page generated September 3, 2015.