Flume Solr BlobDeserializer Configuration Options
Using SpoolDirectorySource, Flume can ingest data from files located in a spooling directory on disk. Unlike other asynchronous sources, SpoolDirectorySource does not lose data even if Flume is restarted or fails. Flume watches the directory for new files and ingests them as they are detected.
By default, SpoolDirectorySource splits text input on newlines into Flume events. You can change this behavior by having Flume Solr BlobDeserializer read Binary Large Objects (BLOBs) from SpoolDirectorySource. This alternative approach is not suitable for very large objects because the entire BLOB is buffered.
Flume Solr BlobDeserializer provides the following configuration options in the flume.conf file:
Property Name |
Default |
Description |
---|---|---|
deserializer |
|
The FQCN of this class:
org.apache.flume.sink.solr. morphline.BlobDeserializer$Builder |
deserializer.maxBlobLength |
100000000 (100 MB) |
The maximum number of bytes to read and buffer for a given request. |
agent.sources.spoolSrc.type = spooldir agent.sources.spoolSrc.spoolDir = /tmp/myspooldir agent.sources.spoolSrc.ignorePattern = \. agent.sources.spoolSrc.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder agent.sources.spoolSrc.deserializer.maxBlobLength = 2000000000 agent.sources.spoolSrc.batchSize = 1 agent.sources.spoolSrc.fileHeader = true agent.sources.spoolSrc.fileHeaderKey = resourceName agent.sources.spoolSrc.interceptors = uuidinterceptor agent.sources.spoolSrc.interceptors.uuidinterceptor.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder agent.sources.spoolSrc.interceptors.uuidinterceptor.headerName = id #agent.sources.spoolSrc.interceptors.uuidinterceptor.preserveExisting = false #agent.sources.spoolSrc.interceptors.uuidinterceptor.prefix = myhostname agent.sources.spoolSrc.channels = memoryChannel