Supported Sources, Sinks, and Channels
The following tables list the only currently-supported sources, sinks, and channels. For more information, including information on developing custom components, see the documents listed under Viewing the Flume Documentation.
Sources
Type |
Description |
Implementation Class |
---|---|---|
avro |
Avro Netty RPC event source. Listens on Avro port and receives events from external Avro streams. |
AvroSource |
netcat |
Netcat style TCP event source. Listens on a given port and turns each line of text into an event. |
NetcatSource |
seq |
Monotonically incrementing sequence generator event source |
SequenceGeneratorSource |
exec |
Execute a long-lived Unix process and read from stdout. |
ExecSource |
syslogtcp |
Reads syslog data and generates flume events. Creates a new event for a string of characters separated by carriage return ( \n ). |
SyslogTcpSource |
syslogudp |
Reads syslog data and generates flume events. Treats an entire message as a single event. |
SyslogUDPSource |
org.apache.flume.source.avroLegacy. AvroLegacySource |
Allows the Flume 1.x agent to receive events from Flume 0.9.4 agents over avro rpc. |
AvroLegacySource |
org.apache.flume.source.thriftLegacy. ThriftLegacySource |
Allows the Flume 1.x agent to receive events from Flume 0.9.4 agents over thrift rpc. |
ThriftLegacySource |
org.apache.flume.source.StressSource |
Mainly for testing purposes. Not meant for production use. Serves as a continuous source of events where each event has the same payload. |
StressSource |
org.apache.flume.source.scribe. ScribeSource |
Scribe event source. Listens on Scribe port and receives events from Scribe. |
ScribeSource |
multiport_syslogtcp |
Multi-port capable version of the SyslogTcpSource. |
MultiportSyslogTCPSource |
spooldir |
Ingests data by placing files to be ingested into a "spooling" directory on disk. |
SpoolDirectorySource |
http |
Accepts Flume events by HTTP POST and GET. GET should be used for experimentation only. |
HTTPSource |
org.apache.flume.source.jms.JMSSource |
Reads messages from a JMS destination such as a queue or topic. |
JMSSource |
org.apache.flume.agent.embedded. EmbeddedSource |
Used only by the Flume embedded agent. See Flume Developer Guide for more details. |
EmbeddedSource |
org.apache.flume.source.kafka.KafkaSource | Streams data from Kafka to Hadoop or from any Flume source to Kafka. | KafkaSource |
org.apache.flume.source.taildir.TaildirSource |
Watches specified files, and tails them in near real-time when it detects appends to these files.
|
TaildirSource |
Sinks
Type |
Description |
Implementation Class |
---|---|---|
logger |
Log events at INFO level using configured logging subsystem (log4j by default) |
LoggerSink |
avro |
Sink that invokes a pre-defined Avro protocol method for all events it receives (when paired with an avro source, forms tiered collection) |
AvroSink |
hdfs |
Writes all events received to HDFS (with support for rolling, bucketing, HDFS-200 append, and more) |
HDFSEventSink |
file_roll |
Writes all events received to one or more files. |
RollingFileSink |
org.apache.flume.hbase.HBaseSink |
A simple sink that reads events from a channel and writes them synchronously to HBase. The AsyncHBaseSink is recommended. See Importing Data Into HBase. |
HBaseSink |
org.apache.flume.sink.hbase.AsyncHBaseSink |
A simple sink that reads events from a channel and writes them asynchronously to HBase. This is the recommended HBase sink, but note that it does not support Kerberos. See Importing Data Into HBase. |
AsyncHBaseSink |
org.apache.flume.sink.solr.morphline.MorphlineSolrSink |
Extracts and transforms data from Flume events, and loads it into Apache Solr servers. See the section on MorphlineSolrSink in the Flume User Guide listed under Viewing the Flume Documentation. |
MorphlineSolrSink |
org.apache.flume.sink.kafka.KafkaSink | Used to send data to Kafka from a Flume source. You can use the Kafka sink in addition to Flume sinks such as HBase or HDFS. | KafkaSink |
Channels
Type |
Description |
Implementation Class |
---|---|---|
memory |
In-memory, fast, non-durable event transport |
MemoryChannel |
jdbc |
JDBC-based, durable event transport (Derby-based) |
JDBCChannel |
file |
File-based, durable event transport |
FileChannel |
org.apache.flume.channel.kafka.KafkaChannel | Use the Kafka channel:
|
KafkaChannel |
Providing for Disk Space Usage
It's important to provide plenty of disk space for any Flume File Channel. The largest consumers of disk space in the File Channel are the data logs. You can configure the File Channel to write these logs to multiple data directories. The following space will be consumed by default in each data directory:
- Current log file (up to 2 GB)
- Last log file (up to 2 GB)
- Pending delete log file (up to 2 GB)
Events in the queue could cause many more log files to be written, each of them up 2 GB in size by default.
You can configure both the maximum log file size (MaxFileSize) and the directories the logs will be written to (DataDirs) when you configure the File Channel; see the File Channel section of the Flume User Guide for details.