This processor is used to create a Hadoop Sequence File, which essentially is a file of key/value pairs. The key
will be a file name and the value will be the flow file content. The processor will take either a merged (a.k.a. packaged) flow
file or a singular flow file. Historically, this processor handled the merging by type and size or time prior to creating a
SequenceFile output; it no longer does this. If creating a SequenceFile that contains multiple files of the same type is desired,
precede this processor with a RouteOnAttribute
processor to segregate files of the same type and follow that with a
MergeContent
processor to bundle up files. If the type of files is not important, just use the
MergeContent
processor. When using the MergeContent
processor, the following Merge Formats are
supported by this processor:
NOTE: The value portion of a key/value pair is loaded into memory. While there is a max size limit of 2GB, this could cause memory issues if there are too many concurrent tasks and the flow file sizes are large.
The value of the Compression codec
property determines the compression library the processor uses to compress content.
Third party libraries are used for compression. These third party libraries can be Java libraries or native libraries.
In case of native libraries, the path of the parent folder needs to be in an environment variable called LD_LIBRARY_PATH
so that NiFi can find the libraries.
sudo yum install snappy
/opt/lib/hadoop/lib/native
.
(Native libraries have file extensions like .so
, .dll
, .lib
, etc. depending on the platform.)
/opt/nativelibs
and change their owner. If NiFi is executed by nifi
user in the nifi
group, then:
chown nifi:nifi /opt/nativelibs
chown nifi:nifi /opt/nativelibs/*
LD_LIBRARY_PATH
needs to be set to contain the path to the folder /opt/nativelibs
.
Compression codec
property can be set to SNAPPY
and a Compression type
can be selected.