CalculateParquetOffsets 2.3.0.4.10.0.0-147

Bundle
org.apache.nifi | nifi-parquet-nar
Description
The processor generates N flow files from the input, and adds attributes with the offsets required to read the group of rows in the FlowFile's content. Can be used to increase the overall efficiency of processing extremely large Parquet files.
Tags
break apart, cluster, efficient processing, load balance, parquet, partition, split
Input Requirement
REQUIRED
Supports Sensitive Dynamic Properties
false
Properties
Relationships
Name Description
success FlowFiles, with special attributes that represent a chunk of the input file.
Reads Attributes
Name Description
record.offset Gets the index of first record in the input.
record.count Gets the number of records in the input.
parquet.file.range.startOffset Gets the start offset of the selected row group in the parquet file.
parquet.file.range.endOffset Gets the end offset of the selected row group in the parquet file.
Writes Attributes
Name Description
record.offset Sets the index of first record of the parquet file.
record.count Sets the number of records in the parquet file.