Configuring the Flume Solr Sink

The tutorial provides examples that work with an environment established using a package-based installation. If you installed Cloudera Search using parcels, adjust file paths accordingly.

  1. Edit /etc/flume-ng/conf/flume.conf to specify the Flume source details and set up the flow. You must set the relative or absolute path to the morphline configuration file:
    agent.sinks.solrSink.morphlineFile = /etc/flume-ng/conf/morphline.conf
  2. Edit /etc/flume-ng/conf/morphline.conf to specify the Solr location details using a SOLR_LOCATOR. The snippet that includes the SOLR_LOCATOR might appear as follows:
    SOLR_LOCATOR : {
      # Name of solr collection
      collection : collection
    
      # ZooKeeper ensemble
      zkHost : "$ZK_HOST"
    }
    
    morphlines : [
      {
        id : morphline1
        importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
        commands : [
          { generateUUID { field : id } }
    
          { # Remove record fields that are unknown to Solr schema.xml.
            # Recall that Solr throws an exception on any attempt to load a document that
            # contains a field that isn't specified in schema.xml.
            sanitizeUnknownSolrFields {
              solrLocator : ${SOLR_LOCATOR} # Location from which to fetch Solr schema
            }
          }
    
          { logDebug { format : "output record: {}", args : ["@{}"] } }
    
          {
            loadSolr {
              solrLocator : ${SOLR_LOCATOR}
            }
          }
        ]
      }
    ]
  3. Copy flume-env.sh.template to flume-env.sh:
    $ sudo cp /etc/flume-ng/conf/flume-env.sh.template \
    /etc/flume-ng/conf/flume-env.sh
  4. Edit /etc/flume-ng/conf/flume-env.sh, inserting or replacing JAVA_OPTS as follows:
    JAVA_OPTS="-Xmx500m"
  5. (Optional) Modify Flume logging settings to facilitate monitoring and debugging:
    $ sudo bash -c 'echo "log4j.logger.org.apache.flume.sink.solr=DEBUG" >> \
    /etc/flume-ng/conf/log4j.properties'
    $ sudo bash -c 'echo "log4j.logger.org.kitesdk.morphline=TRACE" >> \
    /etc/flume-ng/conf/log4j.properties'
  6. (Optional) You can configure the location at which Flume finds Cloudera Search dependencies for Flume Solr Sink using SEARCH_HOME. For example, if you installed Flume from a tarball package, you can configure it to find required files by setting SEARCH_HOME. To set SEARCH_HOME use a command of the form:
    $ export SEARCH_HOME=/usr/lib/search