Penalization vs. Yielding
When an issue occurs during processing, the framework exposes two methods to allow
Processor developers to avoid performing unnecessary work: "penalization" and
"yielding." These two concepts can become confusing for developers new to the NiFi
API. A developer is able to penalize a FlowFile by calling the
penalize(FlowFile)
method of ProcessSession. This causes the FlowFile
itself to be inaccessible to downstream Processors for a period of time. The amount of time
that the FlowFile is inaccessible is determined by the DataFlow Manager by setting the
"Penalty Duration" setting in the Processor Configuration dialog. The default
value is 30 seconds. Typically, this is done when a Processor determines that the data
cannot be processed due to environmental reasons that are expected to sort themselves out. A
great example of this is the PutSFTP processor, which will penalize a FlowFile if a file
already exists on the SFTP server that has the same filename. In this case, the Processor
penalizes the FlowFile and routes it to failure. A DataFlow Manager can then route failure
back to the same PutSFTP Processor. This way, if a file exists with the same filename, the
Processor will not attempt to send the file again for 30 seconds (or whatever period the DFM
has configured the Processor to use). In the meantime, it is able to continue to process
other FlowFiles.
On the other hand, yielding allows a Processor developer to indicate to the framework
that it will not be able to perform any useful function for some period of time. This
commonly happens with a Processor that is communicating with a remote resource. If the
Processor cannot connect to the remote resource, or if the remote resource is expected to
provide data but reports that it has none, the Processor should call
yield
on the ProcessContext
object and then return.
By doing this, the Processor is telling the framework that it should not waste resources
triggering this Processor to run, because there's nothing that it can do - it's
better to use those resources to allow other Processors to run.