Upgrading HDP Manually
Also available as:
PDF
loading table of contents...

Configure, Start, and Validate Apache Flume

Before you can upgrade Apache Flume, you must have first upgraded your HDP components to the latest version (in this case, 2.4.2). This section assumes that you have already upgraded your components for HDP 2.4.2. If you have not already completed these steps, return to Getting Ready to Upgrade and Upgrade 2.0 Components for instructions on how to upgrade your HDP components to 2.4.2.

  1. If you have not already done so, upgrade Flume. On the Flume host machine, run the following command:

    • For RHEL/CentOS/Oracle Linux:

      yum upgrade flume

    • For SLES:

      zypper update flume

      zypper remove flume

      zypper se -s flume

      You should see Flume in the output.

      Install Flume:

      zypper install flume

    • For Ubuntu/Debian:

      HDP support for Debian 6 is deprecated with HDP 2.4.2. Future versions of HDP will no longer be supported on Debian 6.

      apt-get install flume

  2. To confirm that Flume is working correctly, create an example configuration file. The following snippet is a sample configuration that can be set using the properties file. For more detailed information, see the “Flume User Guide.”

    agent.sources = pstream 
    agent.channels = memoryChannel
    agent.channels.memoryChannel.type = memory 
    
    agent.sources.pstream.channels = memoryChannel 
    agent.sources.pstream.type = exec 
    agent.sources.pstream.command = tail -f /etc/passwd 
    
    agent.sinks = hdfsSink
    agent.sinks.hdfsSink.type = hdfs 
    agent.sinks.hdfsSink.channel = memoryChannel
    agent.sinks.hdfsSink.hdfs.path = hdfs://tmp/flumetest 
    agent.sinks.hdfsSink.hdfs.fileType = SequenceFile 
    agent.sinks.hdfsSink.hdfs.writeFormat = Text

    The source here is defined as an exec source. The agent runs a given command on startup, which streams data to stdout, where the source gets it. The channel is defined as an in-memory channel and the sink is an HDFS sink.

  3. Given this configuration, you can start Flume as follows:

    $ bin/flume-ng agent --conf ./conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console
    [Note]Note

    The directory specified for --conf agrument would include a shell script flume-env.sh and potentially a log4j properties file. In this example, we pass a Java option to force Flume to log to the console and we go without a custom environment script.

  4. After validating data in hdfs://tmp/flumetest, stop Flume and restore any backup files. Copy /etc/flume/conf to the conf directory in Flume hosts.