Partitions a CSV file using the partition_csv function of unstructured.io. Properties are forwarded to partition_csv as parameters. The output is a JSON document in the format output by partition_csv.
ai, artificial intelligence, ml, machine learning, text, LLM, partition, csv, partition_csv
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Description |
---|---|---|---|
Include Metadata | Include Metadata | true | Whether to include metadata in the output. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) |
Metadata Filename | Metadata Filename | If present, will be included in the metadata as filename. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) | |
Metadata Last Modified | Metadata Last Modified | Date-time to include in the metadata as last_modified. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) | |
Languages | Languages | Comma-separated list of 3-letter language codes to be used as metadata.languages. If unset, the language is detected via langdetect. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) | |
Include Header | Include Header | false | Whether to interpret the first row of the input as a table header. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) |
Infer Table Structure | Infer Table Structure | true | If true, add text_as_html field to metadata on extracted tables. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) |