PartitionCsv

Description:

Partitions a CSV file using the partition_csv function of unstructured.io. Properties are forwarded to partition_csv as parameters. The output is a JSON document in the format output by partition_csv.

Tags:

ai, artificial intelligence, ml, machine learning, text, LLM, partition, csv, partition_csv

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueDescription
Include MetadataInclude MetadatatrueWhether to include metadata in the output.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Metadata FilenameMetadata FilenameIf present, will be included in the metadata as filename.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Metadata Last ModifiedMetadata Last ModifiedDate-time to include in the metadata as last_modified.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
LanguagesLanguagesComma-separated list of 3-letter language codes to be used as metadata.languages. If unset, the language is detected via langdetect.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Include HeaderInclude HeaderfalseWhether to interpret the first row of the input as a table header.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Infer Table StructureInfer Table StructuretrueIf true, add text_as_html field to metadata on extracted tables.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)