Installing Flume
Flume is included in the HDP repository, but it is not installed automatically as part of the standard HDP installation process. Hortonworks recommends that administrators not install Flume agents on any node in a Hadoop cluster. The following image depicts a sample topology with six Flume agents:
Agents 1, 2, and 4 installed on web servers in Data Centers 1 and 2.
Agents 3 and 5 installed on separate hosts in Data Centers 1 and 2 to collect and forward server data in Avro format.
Agent 6 installed on a separate host on the same network as the Hadoop cluster in Data Center 3 to write all Avro-formatted data to HDFS
Note | |
---|---|
It is possible to run multiple Flume agents on the same host. The sample topology represents only one potential data flow. |
Note | |
---|---|
Hortonworks recommends that administrators use a separate configuration file for each Flume agent. In the diagram above, agents 1, 2, and 4 may have identical configuration files with matching Flume sources, channels, sinks. This is also true of agents 3 and 5. While it is possible to use one large configuration file that specifies all the Flume components needed by all the agents, this is not typical of most production deployments. See Configuring Flume for more information about configuring Flume agents. |
For additional information regading Flume, see Apache Flume Component Guide.
Prerequisites
You must have at least core Hadoop on your system. See Configuring the Remote Repositories for more information.
Verify the HDP repositories are available:
yum list flume
The output should list at least one Flume package similar to the following:
flume.noarch 1.5.2.2.2.6.0-2800.el6 HDP-2.5
If yum responds with "Error: No matching package to list" as shown below, yum cannot locate a matching RPM. This can happen if the repository hosting the HDP RPMs is unavailable, or has been disabled. Follow the instructions at Configuring the Remote Repositories to configure either a public or private repository before proceeding.
Error: No matching package to list.
You must have set up your
JAVA_HOME
environment variable per your operating system. See JDK Requirements for instructions on installing JDK.export JAVA_HOME=/path/to/java
The following Flume components have HDP component dependencies. You cannot use these Flume components if the dependencies are not installed.
Table 17.1. Flume 1.5.2 Dependencies
Flume Component
HDP Component Dependencies
HDFS Sink
Hadoop 2.7.3 with HDP patches
HBase Sink
HBase 1.1.2 with HDP patches
Hive Sink
Hive 1.2.1000, HCatalog 1.2.1000, and Hadoop 2.7.3 (with HDP patches)
Installation
Verify the HDP repositories are available for your Flume installation by entering yum list flume. See Prerequisites for more information.
To install Flume from a terminal window, type:
For RHEL or CentOS:
yum install flume
yum install flume-agent #This installs init scripts
For SLES:
zypper install flume
zypper install flume-agent #This installs init scripts
For Ubuntu and Debian:
apt-get install flume
apt-get install flume-agent #This installs init scripts
The main Flume files are located in /usr/hdp/current/flume-server
. The main
configuration files are located in /etc/flume/conf
.