Flume Solr BlobDeserializer Configuration Options
Flume can ingest data from files placed in a spooling directory on disk. This is done with the help of SpoolDirectorySource. Unlike other asynchronous sources, SpoolDirectorySource avoids data loss even if Flume is restarted or fails. Flume watches the directory for new files and ingests them as they are detected.
By default, SpoolDirectorySource splits text input on newlines into Flume events. If this is not desireable, Flume Solr BlobDeserializer can read Binary Large Objects (BLOBs) from SpoolDirectorySource. Note that this alternative approach is not suitable for very large objects because the entire BLOB is buffered.
Flume Solr BlobDeserializer provides the following configuration options in the flume.conf file:
Property Name |
Default |
Description |
---|---|---|
deserializer |
|
The FQCN of this class:
org.apache.flume.sink.solr. morphline.BlobDeserializer$Builder |
deserializer.maxBlobLength |
100000000 (100 MB) |
The maximum number of bytes to read and buffer for a given request. |
agent.sources.spoolSrc.type = spooldir agent.sources.spoolSrc.spoolDir = /tmp/myspooldir agent.sources.spoolSrc.ignorePattern = \. agent.sources.spoolSrc.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder agent.sources.spoolSrc.deserializer.maxBlobLength = 2000000000 agent.sources.spoolSrc.batchSize = 1 agent.sources.spoolSrc.fileHeader = true agent.sources.spoolSrc.fileHeaderKey = resourceName agent.sources.spoolSrc.interceptors = uuidinterceptor agent.sources.spoolSrc.interceptors.uuidinterceptor.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder agent.sources.spoolSrc.interceptors.uuidinterceptor.headerName = id #agent.sources.spoolSrc.interceptors.uuidinterceptor.preserveExisting = false #agent.sources.spoolSrc.interceptors.uuidinterceptor.prefix = myhostname agent.sources.spoolSrc.channels = memoryChannel
<< Flume Solr BlobHandler Configuration Options | Extracting, Transforming, and Loading Data With Cloudera Morphlines >> | |