CalculateParquetRowGroupOffsets 2.3.0.4.10.0.0-147

Bundle
org.apache.nifi | nifi-parquet-nar
Description
The processor generates one FlowFile from each Row Group of the input, and adds attributes with the offsets required to read the group of rows in the FlowFile's content. Can be used to increase the overall efficiency of processing extremely large Parquet files.
Tags
break apart, cluster, efficient processing, load balance, parquet, partition, split
Input Requirement
REQUIRED
Supports Sensitive Dynamic Properties
false
Properties
Relationships
Name Description
success FlowFiles, with special attributes that represent a chunk of the input file.
Writes Attributes
Name Description
parquet.file.range.startOffset Sets the start offset of the selected row group in the parquet file.
parquet.file.range.endOffset Sets the end offset of the selected row group in the parquet file.
record.count Sets the count of records in the selected row group.