Configure, Start, and Validate Apache Flume
Before you can upgrade Apache Flume, you must have first upgraded your HDP components to the latest version (in this case, 2.4.0). This section assumes that you have already upgraded your components for HDP 2.4.0. If you have not already completed these steps, return to Getting Ready to Upgrade and Upgrade 2.2 Components for instructions on how to upgrade your HDP components to 2.4.0.
To confirm that Flume is working correctly, create an example configuration file. The following snippet is a sample configuration that can be set using the properties file. For more detailed information, see the “Flume User Guide.”
agent.sources = pstream agent.channels = memoryChannel agent.channels.memoryChannel.type = memory agent.sources.pstream.channels = memoryChannel agent.sources.pstream.type = exec agent.sources.pstream.command = tail -f /etc/passwd agent.sinks = hdfsSink agent.sinks.hdfsSink.type = hdfs agent.sinks.hdfsSink.channel = memoryChannel agent.sinks.hdfsSink.hdfs.path = hdfs://tmp/flumetest agent.sinks.hdfsSink.hdfs.fileType = SequenceFile agent.sinks.hdfsSink.hdfs.writeFormat = Text
The source here is defined as an exec source. The agent runs a given command on startup, which streams data to stdout, where the source gets it. The channel is defined as an in-memory channel and the sink is an HDFS sink.
Given this configuration, you can start Flume as follows:
$ bin/flume-ng agent --conf ./conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console
Note The directory specified for
--conf agrument
would include a shell script flume-env.sh and potentially a log4j properties file. In this example, we pass a Java option to force Flume to log to the console and we go without a custom environment script.After validating data in
hdfs://tmp/flumetest
, stop Flume and restore any backup files. Copy/etc/flume/conf
to the conf directory in Flume hosts.