PutIceberg 2.3.0.4.10.0.0-147

Bundle
org.apache.nifi | nifi-iceberg-processors-nar
Description
This processor uses Iceberg API to parse and load records into Iceberg tables. The incoming data sets are parsed with Record Reader Controller Service and ingested into an Iceberg table using the configured catalog service and provided table information. The target Iceberg table should already exist and it must have matching schemas with the incoming records, which means the Record Reader schema must contain all the Iceberg schema fields, every additional field which is not present in the Iceberg schema will be ignored. To avoid 'small file problem' it is recommended pre-appending a MergeRecord processor.
Tags
avro, iceberg, orc, parquet, parse, put, record, store, table
Input Requirement
Supports Sensitive Dynamic Properties
false
  • Additional Details for PutIceberg 2.3.0.4.10.0.0-147

    PutIceberg

    Description

    Iceberg is a high-performance format for huge analytic tables. The PutIceberg processor is capable of pushing data into Iceberg tables using different types of Iceberg catalog implementations.

    Commit retry properties

    Iceberg supports multiple concurrent writes using optimistic concurrency. The processor’s commit retry implementation is using exponential backoff with jitter and scale factor 2, and provides the following properties to configure the behaviour according to its usage.

    • Number Of Commit Retries (default: 10) - Number of retries that the processor is going to try to commit the new data files.
    • Minimum Commit Wait Time (default: 100 ms) - Minimum time that the processor is going to wait before each commit attempt.
    • Maximum Commit Wait Time (default: 2 sec) - Maximum time that the processor is going to wait before each commit attempt.
    • Maximum Commit Duration (default: 30 sec) - Maximum duration that the processor is going to wait before failing the current processor event’s commit.

    The NiFi side retry logic is built on top of the Iceberg commit retry logic which can be configured through table properties. See more: Table behavior properties

    Snapshot summary properties

    The processor provides an option to add additional properties to the snapshot summary using dynamic properties. The additional property must have the ‘snapshot-property.’ prefix in the dynamic property key but the actual entry will be inserted without it. Each snapshot automatically gets the FlowFile’s uuid in the ’nifi-flowfile-uuid’ summary property.

    Processor instance isolation

    By default, the processor instance isolation works based on Kerberos user principal provided in the KerberosUserService. If there is no KerberosUserService defined on the processor then the processor will use the default classloader. When further isolation needed, the user can configure the processor to make every processor instance use a separated classloader. To enable this, a new dynamic property needs to be added with ‘classloader.isolation.full’ as key and ’true’ as value.

Properties
Dynamic Properties
Relationships
Name Description
success A FlowFile is routed to this relationship after the data ingestion was successful.
failure A FlowFile is routed to this relationship if the operation failed and retrying the operation will also fail, such as an invalid data or schema.
Writes Attributes
Name Description
iceberg.record.count The number of records in the FlowFile.