Penalization vs. Yielding
When an issue occurs during processing, the framework exposes two methods to allow
Processor developers to avoid performing unnecessary work: "penalization" and
"yielding." These two concepts can become confusing for developers new to the
NiFi API. A developer is able to penalize a FlowFile by calling the
penalize(FlowFile)
method of ProcessSession. This causes the FlowFile
itself to be inaccessible to downstream Processors for a period of time. The amount of
time that the FlowFile is inaccessible is determined by the DataFlow Manager by setting
the "Penalty Duration" setting in the Processor Configuration dialog. The
default value is 30 seconds. Typically, this is done when a Processor determines that the
data cannot be processed due to environmental reasons that are expected to sort themselves
out. A great example of this is the PutSFTP processor, which will penalize a FlowFile if a
file already exists on the SFTP server that has the same filename. In this case, the
Processor penalizes the FlowFile and routes it to failure. A DataFlow Manager can then
route failure back to the same PutSFTP Processor. This way, if a file exists with the same
filename, the Processor will not attempt to send the file again for 30 seconds (or
whatever period the DFM has configured the Processor to use). In the meantime, it is able
to continue to process other FlowFiles.
On the other hand, yielding allows a Processor developer to indicate to the framework
that it will not be able to perform any useful function for some period of time. This
commonly happens with a Processor that is communicating with a remote resource. If the
Processor cannot connect to the remote resource, or if the remote resource is expected to
provide data but reports that it has none, the Processor should call
yield
on the ProcessContext
object and then
return. By doing this, the Processor is telling the framework that it should not waste
resources triggering this Processor to run, because there's nothing that it can do -
it's better to use those resources to allow other Processors to run.