Flume Solr BlobDeserializer Configuration Options

Using SpoolDirectorySource, Flume can ingest data from files located in a spooling directory on disk. Unlike other asynchronous sources, SpoolDirectorySource does not lose data even if Flume is restarted or fails. Flume watches the directory for new files and ingests them as they are detected.

By default, SpoolDirectorySource splits text input on newlines into Flume events. You can change this behavior by having Flume Solr BlobDeserializer read Binary Large Objects (BLOBs) from SpoolDirectorySource. This alternative approach is not suitable for very large objects because the entire BLOB is buffered.

Flume Solr BlobDeserializer provides the following configuration options in the flume.conf file:

Property Name

Default

Description

deserializer

 

The FQCN of this class:
org.apache.flume.sink.solr.
morphline.BlobDeserializer$Builder

deserializer.maxBlobLength

100000000 (100 MB)

The maximum number of bytes to read and buffer for a given request.

This example shows a flume.conf section for a SpoolDirectorySource with a BlobDeserializer for the agent named agent:
agent.sources.spoolSrc.type = spooldir
agent.sources.spoolSrc.spoolDir = /tmp/myspooldir
agent.sources.spoolSrc.ignorePattern = \.
agent.sources.spoolSrc.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
agent.sources.spoolSrc.deserializer.maxBlobLength = 2000000000
agent.sources.spoolSrc.batchSize = 1
agent.sources.spoolSrc.fileHeader = true
agent.sources.spoolSrc.fileHeaderKey = resourceName
agent.sources.spoolSrc.interceptors = uuidinterceptor
agent.sources.spoolSrc.interceptors.uuidinterceptor.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
agent.sources.spoolSrc.interceptors.uuidinterceptor.headerName = id
#agent.sources.spoolSrc.interceptors.uuidinterceptor.preserveExisting = false
#agent.sources.spoolSrc.interceptors.uuidinterceptor.prefix = myhostname
agent.sources.spoolSrc.channels = memoryChannel