Reporting Processor Activity
Processors are responsible for reporting their activity so that users are able to understand what happens to their data. Processors should log events via the ComponentLog, which is accessible via the InitializationContext or by calling the getLogger
method of AbstractProcessor
.
Additionally, Processors should use the ProvenanceReporter
interface,
obtained via the ProcessSession's getProvenanceReporter
method. The
ProvenanceReporter should be used to indicate any time that content is received from an
external source or sent to an external location. The ProvenanceReporter also has methods
for reporting when a FlowFile is cloned, forked, or modified, and when multiple FlowFiles
are merged into a single FlowFile as well as associating a FlowFile with some other
identifier. However, these functions are less critical to report, as the framework is able
to detect these things and emit appropriate events on the Processor's behalf. Yet, it is a
best practice for the Processor developer to emit these events, as it becomes explicit in
the code that these events are being emitted, and the developer is able to provide
additional details to the events, such as the amount of time that the action took or
pertinent information about the action that was taken. If the Processor emits an event, the
framework will not emit a duplicate event. Instead, it always assumes that the Processor
developer knows what is happening in the context of the Processor better than the framework
does. The framework may, however, emit a different event. For example, if a Processor
modifies both the content of a FlowFile and its attributes and then emits only an
ATTRIBUTES_MODIFIED event, the framework will emit a CONTENT_MODIFIED event. The framework
will not emit an ATTRIBUTES_MODIFIED event if any other event is emitted for that FlowFile
(either by the Processor or the framework). This is due to the fact that all provenance
events know about the attributes of the FlowFile before the event occurred as well as those
attributes that occurred as a result of the processing of that FlowFile, and as a result
the ATTRIBUTES_MODIFIED is generally considered redundant and would result in a rendering
of the FlowFile lineage being very verbose. It is, however, acceptable for a Processor to
emit this event along with others, if the event is considered pertinent from the
perspective of the Processor.