Using DefragmentText processor

Learn about the DefragmentText processor, its properties, its relationships, and its limitations. Also learn about how to use the DefragmentText processor.

DefragmentText processor buffers the incoming flow files until their contents create a cohesive message, based on the start or end line pattern.

Properties

The following list describes the properties of the DefragmentText processor:

Pattern
A regular expression to match at the start or end of messages.
Pattern Location
Whether the pattern is located at the start or at the end of the messages.
Max Buffer Age
The maximum age of the buffer after which it is transferred to success when matching Start of Message patterns or to failure when matching End of Message patterns.
Expected format is <duration> <time unit>.
Max Buffer Size
The maximum buffer size. If the buffer exceeds this, it is transferred to failure.
Expected format is <size> <size unit>.

The following image shows the DefragmentText processor properties:

Relationships

The two relationships of the DefragmentText processor are as follows:

Success
The part of the incoming flow files that form cohesive messages.
Failure
Flowfiles that failed the defragmentation process. This can happen if the buffer size is reached, or if the incoming files originate from different sources.

How to use

Simply connect to a source processor which generates a consecutive stream of text based data (for example, TailFile), and configure the Pattern and Pattern Location properties so that the DefragmentText processor can resegment the data with regex matching.

With the Maximum Buffer Size you can limit how large these flow files can grow (if the pattern matching fails), and with Maximum Buffer Age you can ensure that the messages are sent out even if there is no more incoming data.

The Failure relationship indicates that the data routed to this relationship might not be defragmented (Buffer size limit reached, incoming data is from different sources etc), but no data is lost.

Limitations

Limitations of the DefragmentText processor are as follows:

It is a single threaded processor (multi threaded operations are disabled).
Since this processor can only buffer one flow file at a time, this processor should only be used with a single source.
When used with TailFile, TailFile should be in Single File mode.

Real world example: Tailing a Java application logfile

TailFile is set up to tail the log file of a Java application. The output of the TailFile is connected to the DefragmentText processor.

The DefragmentText Pattern property is set to a regular expression to match the timestamp at the start of each log message. For example,

(((19|20)([2468][048]|[13579][26]|0[48])|2000)-02-29|((19|20)[0-9]{2}-(0[4678]|1[02])-(0[1-9]|[12][0-9]|30)|(19|20)[0-9]{2}-(0[1359]|11)-(0[1-9]|[12][0-9]|3[01])|(19|20)[0-9]{2}-02-(0[1-9]|1[0-9]|2[0-8])))\s([01][0-9]|2[0-3]):([012345][0-9]):([012345][0-9]))

The DefragmentText Pattern Location property is set to the Start of Message.

This setup ensures that when a java exception happens, the contents of that exception are not split among 10-20 flowfiles; instead they are in a single flowfile.