Apache Flume Component Guide
Also available as:
PDF

Chapter 1. Feature Changes

Apache Flume version 1.5.2 includes cumulative 1.6 features and selected 1.7 features. The table below indicates the features added to Flume 1.5.2 with each release of Hortonworks Data Platform (HDP).

Content related to the Flume 1.7 features has been added to the Flume 1.5.2 User Guide and Developer Guide.

[Note]Note

As of HDP 2.6.0, Flume is deprecated but still supported. It will be removed and no longer supported as of HDP 3.0.0.

Table 1.1. Features by HDP Version

HDP ReleaseAdded FeaturesAdvantages
2.6.2Flume-Kafka Source with new Consumer​ (FLUME-2821)Provides support for new Kafka Consumer API, including reading multiple Kafka topics.
 Flume-Kafka-Sink with new Producer (FLUME-2822)Provides support for new Kafka Producer API.
 Kafka Source/Sink should optionally read/write Flume records (FLUME-2852)Preserve event headers when using Kafka Sink and Kafka Source together. Avro Datums can be read from and written to Kafka.
 Make raw data appearing in log messages explicit (FLUME-2954)Adds two system properties, one to enable logging of Flume configuration properties and one to enable logging of raw data. Helps determine whether or not logging potentially sensitive information is allowed.
 Handle offset migration in the new Kafka Channel (FLUME-2972)If Kafka offsets do not exist and migration is enabled, the offsets are copied from ZooKeeper to Kafka. Also addresses the backwards incompatibility issue with the zookeeperConnect property.
 Handle offset migration in the new Kafka Source (FLUME-2983)If Kafka offsets do not exist and migration is enabled, the offsets are copied from ZooKeeper to Kafka. Also addresses the backwards incompatibility issue with the zookeeperConnect property.
 Bug fixesFLUME-2915, FLUME-2920, FLUME-2963
2.5.0Kafka ChannelUses a single Kafka topic. Provides greater reliability and better performance.
 TailDir Source Greater data reliability, even with rotating file names. Can restart tailing at the point where Flume stopped, while continuing data ingest.
2.4.0Kafka SourceReads messages from a Kafka topic. Can have multiple Kafka sources running and configure them to read a unique set of partitions for the topic.
 Kafka SinkPublishes data to a Kafka topic. Supports pull-based processing from various Flume sources.
2.3.0Hive SinkNot recommended for use in production. Streams events containing delimited text or JSON data directly into a Hive table or partition. Provides a preview feature and not.