This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

Flume Solr BlobDeserializer Configuration Options

Flume can ingest data from files placed in a spooling directory on disk. This is done with the help of SpoolDirectorySource. Unlike other asynchronous sources, SpoolDirectorySource avoids data loss even if Flume is restarted or fails. Flume watches the directory for new files and ingests them as they are detected.

By default, SpoolDirectorySource splits text input on newlines into Flume events. If this is not desireable, Flume Solr BlobDeserializer can read Binary Large Objects (BLOBs) from SpoolDirectorySource. Note that this alternative approach is not suitable for very large objects because the entire BLOB is buffered.

Flume Solr BlobDeserializer provides the following configuration options in the flume.conf file:

Property Name	Default	Description
deserializer		The FQCN of this class: org.apache.flume.sink.solr. morphline.BlobDeserializer$Builder
deserializer.maxBlobLength	100000000 (100 MB)	The maximum number of bytes to read and buffer for a given request.

For example, here is a flume.conf section for a SpoolDirectorySource with a BlobDeserializer for the agent named "agent":

agent.sources.spoolSrc.type = spooldir
agent.sources.spoolSrc.spoolDir = /tmp/myspooldir
agent.sources.spoolSrc.ignorePattern = \.
agent.sources.spoolSrc.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
agent.sources.spoolSrc.deserializer.maxBlobLength = 2000000000
agent.sources.spoolSrc.batchSize = 1
agent.sources.spoolSrc.fileHeader = true
agent.sources.spoolSrc.fileHeaderKey = resourceName
agent.sources.spoolSrc.interceptors = uuidinterceptor
agent.sources.spoolSrc.interceptors.uuidinterceptor.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
agent.sources.spoolSrc.interceptors.uuidinterceptor.headerName = id
#agent.sources.spoolSrc.interceptors.uuidinterceptor.preserveExisting = false
#agent.sources.spoolSrc.interceptors.uuidinterceptor.prefix = myhostname
agent.sources.spoolSrc.channels = memoryChannel

Page generated September 3, 2015.