Reporting Processor Activity
Processors are responsible for reporting their activity so that users are able to
understand what happens to their data. Processors should log events via the ComponentLog,
which is accessible via the InitializationContext or by calling the
getLogger
method of AbstractProcessor
.
Additionally, Processors should use the ProvenanceReporter
interface, obtained via the ProcessSession's getProvenanceReporter
method. The ProvenanceReporter should be used to indicate any time that content is
received from an external source or sent to an external location. The ProvenanceReporter
also has methods for reporting when a FlowFile is cloned, forked, or modified, and when
multiple FlowFiles are merged into a single FlowFile as well as associating a FlowFile
with some other identifier. However, these functions are less critical to report, as the
framework is able to detect these things and emit appropriate events on the
Processor's behalf. Yet, it is a best practice for the Processor developer to emit
these events, as it becomes explicit in the code that these events are being emitted, and
the developer is able to provide additional details to the events, such as the amount of
time that the action took or pertinent information about the action that was taken. If the
Processor emits an event, the framework will not emit a duplicate event. Instead, it
always assumes that the Processor developer knows what is happening in the context of the
Processor better than the framework does. The framework may, however, emit a different
event. For example, if a Processor modifies both the content of a FlowFile and its
attributes and then emits only an ATTRIBUTES_MODIFIED event, the framework will emit a
CONTENT_MODIFIED event. The framework will not emit an ATTRIBUTES_MODIFIED event if any
other event is emitted for that FlowFile (either by the Processor or the framework). This
is due to the fact that all Provenance Events
know about the attributes of the FlowFile before the event occurred as well as those
attributes that occurred as a result of the processing of that FlowFile, and as a result
the ATTRIBUTES_MODIFIED is generally considered redundant and would result in a rendering
of the FlowFile lineage being very verbose. It is, however, acceptable for a Processor to
emit this event along with others, if the event is considered pertinent from the
perspective of the Processor.