Partitions a text file using the partition_text function of unstructured.io. Properties are forwarded to partition_text as parameters. The output is a JSON document in the format output by partition_text.
ai, artificial intelligence, ml, machine learning, text, LLM, partition, partition_text
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Description |
---|---|---|---|
Encoding | Encoding | UTF-8 | The character encoding used to decode the text input. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) |
Include Metadata | Include Metadata | true | Whether to include metadata in the output. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) |
Languages | Languages | Comma-separated list of 3-letter language codes to be used as metadata.languages. If unset, the language is detected via langdetect. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) | |
Max Partition | Max Partition | The maximum number of characters in each partition. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) | |
Min Partition | Min Partition | The minimum number of characters in each partition. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) | |
Metadata Last Modified | Metadata Last Modified | Date-time to include in the metadata as last_modified. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) |