17. Configure, Start, and Validate Apache Flume

  1. If you have not already done so, upgrade Apache Flume. On the Flume host machine, run the following command:

    • For RHEL/CentOS/Oracle Linux:

      yum upgrade flume

    • For SLES:

      zypper update flume

      zypper remove flume

      zypper se -s flume

      You should see Flume in the output.

      Install Flume:

      zypper install flume

    • For Ubuntu/Debian:

      apt-get install flume

  2. To confirm that Flume is working correctly, create an example configuration file. The following snippet is a sample configuration that can be set using the properties file. For more detailed information, see the “Flume User Guide.”

    agent.sources = pstream 
    agent.channels = memoryChannel
    agent.channels.memoryChannel.type = memory 
    agent.sources.pstream.channels = memoryChannel 
    agent.sources.pstream.type = exec 
    agent.sources.pstream.command = tail -f /etc/passwd 
    agent.sinks = hdfsSink
    agent.sinks.hdfsSink.type = hdfs 
    agent.sinks.hdfsSink.channel = memoryChannel
    agent.sinks.hdfsSink.hdfs.path = hdfs://tmp/flumetest 
    agent.sinks.hdfsSink.hdfs.fileType = SequenceFile 
    agent.sinks.hdfsSink.hdfs.writeFormat = Text

    The source here is defined as an exec source. The agent runs a given command on startup, which streams data to stdout, where the source gets it. The channel is defined as an in-memory channel and the sink is an HDFS sink.

  3. Given this configuration, you can start Flume as follows:

    $ bin/flume-ng agent --conf ./conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console

    The directory specified for --conf agrument would include a shell script flume-env.sh and potentially a log4j properties file. In this example, we pass a Java option to force Flume to log to the console and we go without a custom environment script.

  4. After validating data in hdfs://tmp/flumetest, stop Flume and resore any backup files. Copy /etc/flume/conf to the conf directory in Flume hosts.

loading table of contents...