Split Content (One-to-Many)
This Processor generally requires no user configuration, with the exception of the
size of each Split to create. The onTrigger
method obtains a FlowFile
from its input queues. A List of type FlowFile is created. The original FlowFile is read
via the ProcessSession's read
method, and an InputStreamCallback
is used. Within the InputStreamCallback, the content is read until a point is reached at
which the FlowFile should be split. If no split is needed, the Callback returns, and the
original FlowFile is routed to success
. In this case, a Provenance
ROUTE event is emitted. Typically, ROUTE events are not emitted when routing a FlowFile to
success
because this generates a very verbose lineage that becomes
difficult to navigate. However, in this case,the event is useful because we would
otherwise expect a FORK event and the absence of any event is likely to cause confusion.
The fact that the FlowFile was not split but was instead transferred to
success
is logged, and the method returns.
If a point is reached at which a FlowFile needs to be split, a new FlowFile is created
via the ProcessSession's create(FlowFile)
method or the
clone(FlowFile, long, long)
method. The next section of code depends
on whether the create
method is used or the clone
method is used. Both methods are described below. Which solution is appropriate must be
determined on a case-by-case basis.
The Create Method is most appropriate when the data will not be directly copied from the original FlowFile to the new FlowFile. For example, if only some of the data will be copied, or if the data will be modified in some way before being copied to the new FlowFile, this method is necessary. However, if the content of the new FlowFile will be an exact copy of a portion of the original FlowFile, the Clone Method is much preferred.
Create Method If using the create
method, the method is called with the original FlowFile as the argument so that the newly
created FlowFile will inherit the attributes of the original FlowFile and a Provenance
FORK event will be created by the framework.
The code then enters a try/finally
block. Within the
finally
block, the newly created FlowFile is added to the List of
FlowFiles that have been created. This is done within a finally
block
so that if an Exception is thrown, the newly created FlowFile will be appropriately
cleaned up. Within the try
block, the callback initiates a new callback
by calling the ProcessSession's write
method with an
OutputStreamCallback. The appropriate data is then copied from the InputStream of the
original FlowFile to the OutputStream for the new FlowFile.
Clone Method If the content of the newly created
created FlowFile is to be only a contiguous subset of the bytes of the original FlowFile,
it is preferred to use the clone(FlowFile, long, long)
method instead
of the create(FlowFile)
method of the ProcessSession. In this case, the
offset of the original FlwoFile at which the new FlowFile's content should begin is
passed as the second argument to the clone
method. The length of the
new FlowFile is passed as the third argument to the clone
method. For
example, if the original FlowFile was 10,000 bytes and we called clone(flowFile,
500, 100)
, the FlowFile that would be returned to us would be identical to
flowFile
with respect to its attributes. However, the content of the
newly created FlowFile would be 100 bytes in length and would start at offset 500 of the
original FlowFile. That is, the contents of the newly created FlowFile would be the same
as if you had copied bytes 500 through 599 of the original FlowFile.
After the clone has been created, it is added to the List of FlowFiles.
This method is much more highly preferred than the Create method, when applicable, because no disk I/O is required. The framework is able to simply create a new FlowFile that references a subset of the original FlowFile's content, rather than actually copying the data. However, this is not always possible. For example, if header information must be copied from the beginning of the original FlowFile and added to the beginning of each Split, then this method is not possible.
Both Methods Regardless of whether the Clone Method or the Create Method is used, the following is applicable:
If at any point in the InputStreamCallback, a condition is reached in which processing
cannot continue (for example, the input is malformed), a
ProcessException
should be thrown. The call to the
ProcessSession's read
method is wrapped in a
try/catch
block where ProcessException
is caught.
If an Exception is caught, a log message is generated explaining the error. The List of
newly created FlowFiles is removed via the ProcessSession's remove
method. The original FlowFile is routed to failure
.
If no problems arise, the original FlowFile is routed to original
and all newly created FlowFiles are updated to include the following attributes:
Attribute Name |
Description |
---|---|
|
The UUID of the original FlowFile |
|
A one-up number indicating which FlowFile in the list this is (the first
FlowFile created will have a value |
|
The total number of split FlowFiles that were created |
The newly created FlowFiles are routed to success
; this event is
logged; and the method returns.