Configuring the Flume Solr Sink
This topic describes modifying configuration files by using either:
- Cloudera Manager in a parcel-based installations to edit the configuration files similar to the process described in Configuring the Flume Agents.
- Command-line tools in a package-based installation to edit files.
- Modify the Flume configuration to specify the Flume source details and set up the flow. You must set the relative or absolute path to the morphline configuration file.
- Parcel-based Installation: In the Cloudera Manager Admin Console, select Configuration File to include:
agent.sinks.solrSink.morphlineFile = /opt/cloudera/parcels/CDH/etc/flume-ng/conf/morphline.conf
and modify - Package-based Installation: Edit /etc/flume-ng/conf/flume.conf to include:
agent.sinks.solrSink.morphlineFile = /etc/flume-ng/conf/morphline.conf
- Parcel-based Installation: In the Cloudera Manager Admin Console, select Configuration File to include:
- Modify the Morphline configuration to specify the Solr location details using a SOLR_LOCATOR.
- Parcel-based Installation: In the Cloudera Manager Admin Console, select Morphline File. and modify
- Package-based Installation: Edit /etc/flume-ng/conf/morphline.conf.
SOLR_LOCATOR : { # Name of solr collection collection : collection # ZooKeeper ensemble zkHost : "$ZK_HOST" } morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**", "org.apache.solr.**"] commands : [ { generateUUID { field : id } } { # Remove record fields that are unknown to Solr schema.xml. # Recall that Solr throws an exception on any attempt to load a document that # contains a field that isn't specified in schema.xml. sanitizeUnknownSolrFields { solrLocator : ${SOLR_LOCATOR} # Location from which to fetch Solr schema } } { logDebug { format : "output record: {}", args : ["@{}"] } } { loadSolr { solrLocator : ${SOLR_LOCATOR} } } ] } ]
- Copy flume-env.sh.template to flume-env.sh:
- Parcel-based Installation:
$ sudo cp /opt/cloudera/parcels/CDH/etc/flume-ng/conf/flume-env.sh.template \ /opt/cloudera/parcels/CDH/etc/flume-ng/conf/flume-env.sh
- Package-based Installation:
$ sudo cp /etc/flume-ng/conf/flume-env.sh.template \ /etc/flume-ng/conf/flume-env.sh
- Parcel-based Installation:
- Update the Java heap size.
- Parcel-based Installation: In the Cloudera Manager Admin Console, select Search box enter Java Heap Size. Modify Java Heap Size of Agent in Bytes to be 500 and choose MiB units. . In the
- Package-based Installation: Edit /etc/flume-ng/conf/flume-env.sh or /opt/cloudera/parcels/CDH/etc/flume-ng/conf/flume-env.sh, inserting or replacing JAVA_OPTS as follows:
JAVA_OPTS="-Xmx500m"
- (Optional) Modify Flume logging settings to facilitate monitoring and debugging:
- Parcel-based Installation: In the Cloudera Manager Admin Console, select Agent Logging Advanced Configuration Snippet (Safety Valve) to include:
log4j.logger.org.apache.flume.sink.solr=DEBUG log4j.logger.org.kitesdk.morphline=TRACE
and modify - Package-based Installation: Use the following commands:
$ sudo bash -c 'echo "log4j.logger.org.apache.flume.sink.solr=DEBUG" >> \ /etc/flume-ng/conf/log4j.properties' $ sudo bash -c 'echo "log4j.logger.org.kitesdk.morphline=TRACE" >> \ /etc/flume-ng/conf/log4j.properties'
- Parcel-based Installation: In the Cloudera Manager Admin Console, select Agent Logging Advanced Configuration Snippet (Safety Valve) to include:
- (Optional) In a packaged-based installation, you can configure where Flume finds Cloudera Search dependencies for Flume Solr Sink using
SEARCH_HOME. For example, if you installed Flume from a tarball package, you can configure it to find required files by setting SEARCH_HOME. To set SEARCH_HOME use a command of the form:
$ export SEARCH_HOME=/usr/lib/search
Alternatively, you can add the same setting to flume-env.sh.
In a Cloudera Manager managed environment, Cloudera Manager automatically updates the SOLR_HOME location with any additional required dependencies.