Configuring Apache Flume
The Flume packages are installed by the Installation wizard, but the service is not created. This page documents how to add, configure, and start the Flume service.
Continue reading:
Adding a Flume Service
Minimum Required Role: Cluster Administrator (also provided by Full Administrator)
- On the tab, click to the right of the cluster name and select Add a Service. A list of service types display. You can add one type of service at a time.
- Select the Flume service and click Continue.
- Select the services on which the new service should depend. All services must depend on the same ZooKeeper service. Click Continue.
- Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the hosts to determine the best hosts for each role. The wizard assigns all
worker roles to the same set of hosts to which the HDFS DataNode role is assigned. You can reassign role instances.
Click a field below a role to display a dialog box containing a list of hosts. If you click a field containing multiple hosts, you can also select All Hosts to assign the role to all hosts, or Custom to display the hosts dialog box.
The following shortcuts for specifying hostname patterns are supported:- Range of hostnames (without the domain portion)
Range Definition Matching Hosts 10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4 host[1-3].company.com host1.company.com, host2.company.com, host3.company.com host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com - IP addresses
- Rack name
Click the View By Host button for an overview of the role assignment by hostname ranges.
- Range of hostnames (without the domain portion)
Configuring the Flume Agents
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
After you create a Flume service, you must first configure the agents before you start them. For detailed information about Flume agent configuration, see the Flume User Guide.
The default Flume agent configuration provided in the Configuration File property of the Agent default role group is a configuration for a single agent in a single tier; you should replace this with your own configuration. When you add new agent roles, they are placed (initially) in the Agent default role group.
Agents that share the same configuration should be members of the same agent role group. You can create new role groups and can move agents between them. If your Flume configuration has multiple tiers, you must create an agent role group for each tier, and move each agent to be a member of the appropriate role group for their tier.
A Flume agent role group Configuration File property can contain the configuration for multiple agents, since each configuration property is prefixed by the agent name. You can set the agents' names using configuration overrides to change the name of a specific agent without changing its role group membership. Different agents can have the same name — agent names do not have to be unique.
- Go to the Flume service.
- Click the Configuration tab.
- Select Modifying Configuration Properties Using Cloudera Manager. . Settings you make to the default role group apply to all agent instances unless you associate those instances with a different role group, or override them for specific agents. See
- Set the Agent Name property to the name of the agent (or one of the agents) defined in the flume.conf configuration file. The agent name can be comprised of letters, numbers, and the underscore character. You can specify only one agent name here — the name you specify will be used as the default for all Flume agent instances, unless you override the name for specific agents. You can have multiple agents with the same name — they will share the configuration specified in on the configuration file.
- Copy the contents of the flume.conf file, in its entirety, into the Configuration File property. Unless overridden for specific agent instances, this property applies to all agents in the group. You can provide multiple agent configurations in this file and use agent name overrides to specify which configuration to use for each agent.
Setting a Flume Agent Name
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
If you have specified multiple agent configurations in a Flume agent role group Configuration File property, you can set the agent name for an agent that uses a different configuration. Overriding the agent name will point the agent to the appropriate properties specified in the agent configuration.
- Go to the Flume service.
- Click the Configuration tab.
- Select .
- Locate the Agent Name property or search for it by typing its name in the Search box.
- Enter a name for the agent.
To apply this configuration property to other role groups as needed, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
- Enter a Reason for change, and then click Save Changes to commit the changes.
Using Flume with HDFS or HBase Sinks
If you want to use Flume with HDFS or HBase sinks, you can add a dependency to that service from the Flume configuration page. This will automatically add the correct client configurations to the Flume agent's classpath.
Using Flume with Solr Sinks
Cloudera Manager provides a set of configuration settings under the Flume service to configure the Flume Morphline Solr Sink. See Configuring the Flume Morphline Solr Sink for Use with the Solr Service for detailed instructions.
Updating Flume Agent Configurations
Minimum Required Role: Full Administrator
If you modify the Configuration File property after you have started the Flume service, update the configuration across Flume agents as follows:- Go to the Flume service.
- Select .
Using Optimal Message Sizes with Flume
For best performance, Cloudera recommends you configure your applications to send messages smaller than 2 MiB in size through Flume.
Backing Up Flume Channel Data Directories
A best practice is to periodically back up your Flume data directories. The dataDir and checkpointDir are located in your Flume home directory: its default location is /var/lib/flume-ng. You can verify the home directory location in Cloudera Manager by going to the tab and searching for the field Flume Home Directory.
To back up Flume data directories:
- In Cloudera Manager, go to the Flume service.
- Stop Flume to ensure that no changes are written to the data or checkpoint directories during the backup.
- Perform a file-level backup of the dataDir and checkpointDir.
- Restart Flume.
For more information on starting and stopping services, see Starting, Stopping, and Restarting Services.