Flume Solr BlobDeserializer Configuration Options

Using SpoolDirectorySource, Flume can ingest data from files located in a directory on disk. Unlike other asynchronous sources, SpoolDirectorySource does not lose data even if Flume is restarted or fails. Flume watches the directory for new files and ingests them as they are detected.

By default, SpoolDirectorySource uses the newline (\n) delimiter to split input into Flume events. You can change this behavior by configuring the Solr BlobDeserializer to read binary large objects (BLOBs) from SpoolDirectorySource. Generally, each file is one BLOB (such as a PDF or image file). Because the entire BLOB is buffered in RAM, this usage is not generally appropriate for very large objects.

The Solr BlobDeserializer supports the following configuration options (required options in bold):

Property Name Default Description
deserializer   Must be set to the fully qualified class name (FQCN) org.apache.flume.sink.solr. morphline.BlobDeserializer$Builder.
deserializer.maxBlobLength 100000000 (100 MB) Specifies the maximum number of bytes to read and buffer per request.
This example shows a section for a SpoolDirectorySource named spoolSrc with a BlobDeserializer for an agent named agent:
agent.sources.spoolSrc.type = spooldir
agent.sources.spoolSrc.spoolDir = /tmp/myspooldir
agent.sources.spoolSrc.ignorePattern = \.
agent.sources.spoolSrc.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
agent.sources.spoolSrc.deserializer.maxBlobLength = 2000000000
agent.sources.spoolSrc.batchSize = 1
agent.sources.spoolSrc.fileHeader = true
agent.sources.spoolSrc.fileHeaderKey = resourceName
agent.sources.spoolSrc.interceptors = uuidinterceptor
agent.sources.spoolSrc.interceptors.uuidinterceptor.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
agent.sources.spoolSrc.interceptors.uuidinterceptor.headerName = id
#agent.sources.spoolSrc.interceptors.uuidinterceptor.preserveExisting = false
#agent.sources.spoolSrc.interceptors.uuidinterceptor.prefix = flume01.example.com
agent.sources.spoolSrc.channels = memoryChannel