Supported Sources, Sinks, and Channels

The following tables list the only currently-supported sources, sinks, and channels. For more information, including information on developing custom components, see the documents listed under Viewing the Flume Documentation.

Sources

Type

Description

Implementation Class

avro

Avro Netty RPC event source. Listens on Avro port and receives events from external Avro streams.

AvroSource

netcat

Netcat style TCP event source. Listens on a given port and turns each line of text into an event.

NetcatSource

seq

Monotonically incrementing sequence generator event source

SequenceGeneratorSource

exec

Execute a long-lived Unix process and read from stdout.

ExecSource

syslogtcp

Reads syslog data and generates flume events. Creates a new event for a string of characters separated by carriage return ( \n ).

SyslogTcpSource

syslogudp

Reads syslog data and generates flume events. Treats an entire message as a single event.

SyslogUDPSource

org.apache.flume.source.avroLegacy. AvroLegacySource

Allows the Flume 1.x agent to receive events from Flume 0.9.4 agents over avro rpc.

AvroLegacySource

org.apache.flume.source.thriftLegacy. ThriftLegacySource

Allows the Flume 1.x agent to receive events from Flume 0.9.4 agents over thrift rpc.

ThriftLegacySource

org.apache.flume.source.StressSource

Mainly for testing purposes. Not meant for production use. Serves as a continuous source of events where each event has the same payload.

StressSource

org.apache.flume.source.scribe. ScribeSource

Scribe event source. Listens on Scribe port and receives events from Scribe.

ScribeSource

multiport_syslogtcp

Multi-port capable version of the SyslogTcpSource.

MultiportSyslogTCPSource

spooldir

Used for ingesting data by placing files to be ingested into a "spooling" directory on disk.

SpoolDirectorySource

http

Accepts Flume events by HTTP POST and GET. GET should be used for experimentation only.

HTTPSource

org.apache.flume.source.jms.JMSSource

Reads messages from a JMS destination such as a queue or topic.

JMSSource

org.apache.flume.agent.embedded. EmbeddedSource

Used only by the Flume embedded agent. See Flume Developer Guide for more details.

EmbeddedSource

Other (custom)

You need to specify the fully-qualified name of the custom source, and provide that class (and its dependent code) in Flume's classpath. You can do this by creating a JAR file to hold the custom code, and placing the JAR in Flume's lib directory.

Sinks

Type

Description

Implementation Class

logger

Log events at INFO level using configured logging subsystem (log4j by default)

LoggerSink

avro

Sink that invokes a pre-defined Avro protocol method for all events it receives (when paired with an avro source, forms tiered collection)

AvroSink

hdfs

Writes all events received to HDFS (with support for rolling, bucketing, HDFS-200 append, and more)

HDFSEventSink

file_roll

Writes all events received to one or more files.

RollingFileSink

org.apache.flume.hbase.HBaseSink

A simple sink that reads events from a channel and writes them synchronously to HBase. The AsyncHBaseSink is recommended.

HBaseSink

org.apache.flume.sink.hbase.AsyncHBaseSink

A simple sink that reads events from a channel and writes them asynchronously to HBase. This is the recommended HBase sink, but note that it does not support Kerberos.

AsyncHBaseSink

org.apache.flume.sink.solr.morphline.MorphlineSolrSink

Extracts and transforms data from Flume events, and loads it into Apache Solr servers. See the section on MorphlineSolrSink in the Flume User Guide listed under Viewing the Flume Documentation.

MorphlineSolrSink

Other (custom)

You need to specify the fully-qualified name of the custom sink, and provide that class (and its dependent code) in Flume's classpath. You can do this by creating a JAR file to hold the custom code, and placing the JAR in Flume's lib directory.

Channels

Type

Description

Implementation Class

memory

In-memory, fast, non-durable event transport

MemoryChannel

jdbc

JDBC-based, durable event transport (Derby-based)

JDBCChannel

file

File-based, durable event transport

FileChannel

Other (custom)

You need to specify the fully-qualified name of the custom channel, and provide that class (and its dependent code) in Flume's classpath. You can do this by creating a JAR file to hold the custom code, and placing the JAR in Flume's lib directory.

Providing for Disk Space Usage

It's important to provide plenty of disk space for any Flume File Channel. The largest consumers of disk space in the File Channel are the data logs. You can configure the File Channel to write these logs to multiple data directories. The following space will be consumed by default in each data directory:

  • Current log file (up to 2 GB)
  • Last log file (up to 2 GB)
  • Pending delete log file (up to 2 GB)

Events in the queue could cause many more log files to be written, each of them up 2 GB in size by default.

You can configure both the maximum log file size (MaxFileSize) and the directories the logs will be written to (DataDirs) when you configure the File Channel; see the File Channel section of the Flume User Guide for details.