PartitionText

Description:

Partitions a text file using the partition_text function of unstructured.io. Properties are forwarded to partition_text as parameters. The output is a JSON document in the format output by partition_text.

Tags:

ai, artificial intelligence, ml, machine learning, text, LLM, partition, partition_text

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueDescription
EncodingEncodingUTF-8The character encoding used to decode the text input.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Include MetadataInclude MetadatatrueWhether to include metadata in the output.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
LanguagesLanguagesComma-separated list of 3-letter language codes to be used as metadata.languages. If unset, the language is detected via langdetect.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Max PartitionMax PartitionThe maximum number of characters in each partition.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Min PartitionMin PartitionThe minimum number of characters in each partition.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Metadata Last ModifiedMetadata Last ModifiedDate-time to include in the metadata as last_modified.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)