Adding Functionality to Apache NiFi
Also available as:
PDF
loading table of contents...

Route Based on Content (One-to-Many)

If a Processor will route a single FlowFile to potentially many relationships, this Processor will be slightly different than the above-described Processor for Routing Data Based on Content. This Processor typically has Relationships that are dynamically defined by the user as well as an unmatched relationship.

In order for the user to be able to define additionally Properties, the getSupportedDynamicPropertyDescriptor method must be overridden. This method returns a PropertyDescriptor with the supplied name and an applicable Validator to ensure that the user-specified Matching Criteria is valid.

In this Processor, the Set of Relationships that is returned by the getRelationships method is a member variable that is marked volatile. This Set is initially constructed with a single Relationship named unmatched. The onPropertyModified method is overridden so that when a Property is added or removed, a new Relationship is created with the same name. If the Processor has Properties that are not user-defined, it is important to check if the specified Property is user-defined. This can be achieved by calling the isDynamic method of the PropertyDescriptor that is passed to this method. If this Property is dynamic, a new Set of Relationships is then created, and the previous set of Relationships is copied into it. This new Set either has the newly created Relationship added to it or removed from it, depending on whether a new Property was added to the Processor or a Property was removed (Property removal is detected by check if the third argument to this function is null). The member variable holding the Set of Relationships is then updated to point to this new Set.

If the Properties that specify routing criteria require processing, such as compiling a Regular Expression, this processing is done in a method annotated with @OnScheduled, if possible. The result is then stored in a member variable that is marked as volatile. This member variable is generally of type Map where the key is of type Relationship and the value's type is defined by the result of processing the property value.

The onTrigger method obtains a FlowFile via the get method of ProcessSession. If no FlowFile is available, it returns immediately. Otherwise, a Set of type Relationship is created. The method reads the contents of the FlowFile via the ProcessSession's read method, evaluating each of the Match Criteria as the data is streamed. For any criteria that matches, the relationship associated with that Match Criteria is added to the Set of Relationships.

After reading the contents of the FlowFile, the method checks if the Set of Relationships is empty. If so, the original FlowFile has an attribute added to it to indicate the Relationship to which it was routed and is routed to the unmatched. This is logged, a Provenance ROUTE event is emitted, and the method returns. If the size of the Set is equal to 1, the original FlowFile has an attribute added to it to indicate the Relationship to which it was routed and is routed to the Relationship specified by the entry in the Set. This is logged, a Provenance ROUTE event is emitted for the FlowFile, and the method returns.

In the event that the Set contains more than 1 Relationship, the Processor creates a clone of the FlowFile for each Relationship, except for the first. This is done via the clone method of the ProcessSession. There is no need to report a CLONE Provenance Event, as the framework will handle this for you. The original FlowFile and each clone are routed to their appropriate Relationship with attribute indicating the name of the Relationship. A Provenance ROUTE event is emitted for each FlowFile. This is logged, and the method returns.

This Processor is annotated with the @SideEffectFree and @SupportsBatching annotations from the org.apache.nifi.annotations.behavior package.