RouteText

Description:

Routes textual data based on a set of user-defined rules. Each line in an incoming FlowFile is compared against the values specified by user-defined Properties. The mechanism by which the text is compared to these user-defined properties is defined by the 'Matching Strategy'. The data is then routed according to these rules, routing each line of the text individually.

Tags:

attributes, routing, text, regexp, regex, Regular Expression, Expression Language, csv, filter, logs, delimited, find, string, search, filter, detect

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Routing StrategyRouting StrategyRoute to each matching Property Name
  • Route to each matching Property Name Lines will be routed to each relationship whose corresponding expression evaluates to 'true'
  • Route to 'matched' if line matches all conditions Requires that all user-defined expressions evaluate to 'true' for the line to be considered a match
  • Route to 'matched' if lines matches any condition Requires that at least one user-defined expression evaluate to 'true' for the line to be considered a match
Specifies how to determine which Relationship(s) to use when evaluating the lines of incoming text against the 'Matching Strategy' and user-defined properties.
Matching StrategyMatching Strategy
  • Satisfies Expression Match lines based on whether or not the the text satisfies the given Expression Language expression. I.e., the line will match if the property value, evaluated as an Expression, returns true. The expression is able to reference FlowFile Attributes, as well as the variables 'line' (which is the text of the line to evaluate) and 'lineNo' (which is the line number being evaluated. This will be 1 for the first line, 2 for the second and so on).
  • Starts With Match lines based on whether the line starts with the property value
  • Ends With Match lines based on whether the line ends with the property value
  • Contains Match lines based on whether the line contains the property value
  • Equals Match lines based on whether the line equals the property value
  • Matches Regular Expression Match lines based on whether the line exactly matches the Regular Expression that is provided as the Property value
  • Contains Regular Expression Match lines based on whether the line contains some text that matches the Regular Expression that is provided as the Property value
Specifies how to evaluate each line of incoming text against the user-defined properties.
Character SetCharacter SetUTF-8The Character Set in which the incoming text is encoded
Ignore Leading/Trailing WhitespaceIgnore Leading/Trailing WhitespacetrueIndicates whether or not the whitespace at the beginning and end of the lines should be ignored when evaluating the line.
Ignore CaseIgnore Casefalse
  • true
  • false
If true, capitalization will not be taken into account when comparing values. E.g., matching against 'HELLO' or 'hello' will have the same result. This property is ignored if the 'Matching Strategy' is set to 'Satisfies Expression'.
Grouping Regular ExpressionGrouping Regular ExpressionSpecifies a Regular Expression to evaluate against each line to determine which Group the line should be placed in. The Regular Expression must have at least one Capturing Group that defines the line's Group. If multiple Capturing Groups exist in the Regular Expression, the values from all Capturing Groups will be concatenated together. Two lines will not be placed into the same FlowFile unless they both have the same value for the Group (or neither line matches the Regular Expression). For example, to group together all lines in a CSV File by the first column, we can set this value to "(.*?),.*". Two lines that have the same Group but different Relationships will never be placed into the same FlowFile.

Dynamic Properties:

Supports Sensitive Dynamic Properties: No

Dynamic Properties allow the user to specify both the name and value of a property.

NameValueDescription
Relationship Namevalue to match againstRoutes data that matches the value specified in the Dynamic Property Value to the Relationship specified in the Dynamic Property Key.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)

Relationships:

NameDescription
originalThe original input file will be routed to this destination when the lines have been successfully routed to 1 or more relationships
unmatchedData that does not satisfy the required user-defined rules will be routed to this Relationship

Dynamic Relationships:

A Dynamic Relationship may be created based on how the user configures the Processor.

NameDescription
Name from Dynamic PropertyFlowFiles that match the Dynamic Property's value

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
RouteText.RouteThe name of the relationship to which the FlowFile was routed.
RouteText.GroupThe value captured by all capturing groups in the 'Grouping Regular Expression' property. If this property is not set or contains no capturing groups, this attribute will not be added.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

Example Use Cases:

Use Case:

Drop blank or empty lines from the FlowFile's content.

Keywords:

filter, drop, empty, blank, remove, delete, strip out, lines, text

Configuration:

"Routing Strategy" = "Route to each matching Property Name"

"Matching Strategy" = "Matches Regular Expression"

"Empty Line" = "^$"

Auto-terminate the "Empty Line" relationship.

Connect the "unmatched" relationship to the next processor in your flow.



Use Case:

Remove specific lines of text from a file, such as those containing a specific word or having a line length over some threshold.

Keywords:

filter, drop, empty, blank, remove, delete, strip out, lines, text, expression language

Configuration:

"Routing Strategy" = "Route to each matching Property Name"

"Matching Strategy" = "Satisfies Expression"

An additional property should be added named "Filter Out." The value should be a NiFi Expression Language Expression that can refer to two variables (in addition to FlowFile attributes): line, which is the line of text being evaluated; and lineNo, which is the line number in the file (starting with 1). The Expression should return true for any line that should be dropped.

For example, to remove any line that starts with a # symbol, we can set "Filter Out" to ${line:startsWith("#")}.

We could also remove the first 2 lines of text by setting "Filter Out" to ${lineNo:le(2)}. Note that we use the le function because we want lines numbers less than or equal to 2, since the line index is 1-based.

Auto-terminate the "Filter Out" relationship.

Connect the "unmatched" relationship to the next processor in your flow.



System Resource Considerations:

None specified.