Flume Solr UUIDInterceptor Configuration Options

Flume can modify and drop events using Interceptors, which can be attached to any Flume source. The Solr UUIDInterceptor sets a universally unique 128-bit identifier (such as f692639d-483c-1b5f-cd61-183cb1726ae0) on each event.

Cloudera recommends assigning UUIDs to events as early as possible (for example, in the first Flume source of your data flow). This allows you to de-duplicate events that are duplicated as a result of replication or re-delivery in a Flume pipeline that is designed for high availability and high performance. If available, application-level UUIDs are preferable to auto-generated UUIDs because they enable subsequent updates and deletion of the document in Solr using that key. If application-level UUIDs are not present, you can use UUIDInterceptor to automatically assign UUIDs to document events.

The UUIDInterceptor supports the following configuration options (required options in bold):

Property Name Default Description
type   Must be set to the fully qualified class name (FQCN)org.apache.flume.sink.solr. morphline.UUIDInterceptor$Builder.
headerName id The name of the Flume header to use for setting the UUID.
preserveExisting true Determines whether to preserve existing UUID headers.
prefix "" Specifies a string to prepend to each generated UUID.

For examples, see BlobHandler and BlobDeserializer.