Reporting Processor Activity
Processors are responsible for reporting their activity so that users are able to
understand what happens to their data. Processors should log events via the ComponentLog,
which is accessible via the InitializationContext or by calling the
getLogger
method of AbstractProcessor
.
Additionally, Processors should use the ProvenanceReporter
interface,
obtained via the ProcessSession's getProvenanceReporter
method. The
ProvenanceReporter should be used to indicate any time that content is received from an
external source or sent to an external location. The ProvenanceReporter also has methods for
reporting when a FlowFile is cloned, forked, or modified, and when multiple FlowFiles are
merged into a single FlowFile as well as associating a FlowFile with some other identifier.
However, these functions are less critical to report, as the framework is able to detect
these things and emit appropriate events on the Processor's behalf. Yet, it is a best
practice for the Processor developer to emit these events, as it becomes explicit in the
code that these events are being emitted, and the developer is able to provide additional
details to the events, such as the amount of time that the action took or pertinent
information about the action that was taken. If the Processor emits an event, the framework
will not emit a duplicate event. Instead, it always assumes that the Processor developer
knows what is happening in the context of the Processor better than the framework does. The
framework may, however, emit a different event. For example, if a Processor modifies both
the content of a FlowFile and its attributes and then emits only an ATTRIBUTES_MODIFIED
event, the framework will emit a CONTENT_MODIFIED event. The framework will not emit an
ATTRIBUTES_MODIFIED event if any other event is emitted for that FlowFile (either by the
Processor or the framework). This is due to the fact that all Provenance Events know about the attributes of the
FlowFile before the event occurred as well as those attributes that occurred as a result of
the processing of that FlowFile, and as a result the ATTRIBUTES_MODIFIED is generally
considered redundant and would result in a rendering of the FlowFile lineage being very
verbose. It is, however, acceptable for a Processor to emit this event along with others, if
the event is considered pertinent from the perspective of the Processor.