Queries OpenSearch in order to gather a specified number of documents that are most closely related to the given query.
opensearch, vector, vectordb, vectorstore, embeddings, ai, artificial intelligence, ml, machine learning, text, LLM
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Description |
---|---|---|---|
Output Strategy | Output Strategy | Row-Oriented | Specifies whether the output should contain only the text of the documents (each document separated by \n\n), or if it should be formatted as either single column-oriented JSON object, consisting of a keys 'ids', 'embeddings', 'documents', 'distances', and 'metadatas'; or if the results should be row-oriented, a JSON per line, each consisting of a single id, document, metadata, embedding, and distance. |
Query | Query | The text of the query to send to OpenSearch. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) | |
Number of Results | Number of Results | 10 | The number of results to return from OpenSearch Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) |
Search Type | Search Type | Specifies the type of the search to be performed. | |
Script Scoring Space Type | Script Scoring Space Type | Used to measure the distance between two points in order to determine the k-nearest neighbors. | |
Painless Scripting Space Type | Painless Scripting Space Type | Used to measure the distance between two points in order to determine the k-nearest neighbors. | |
Boolean Filter | Boolean Filter | A Boolean filter is a post filter consists of a Boolean query that contains a k-NN query and a filter. The value of the field must be a JSON representation of the filter. | |
Efficient Filter | Efficient Filter | The Lucene Engine or Faiss Engine decides whether to perform an exact k-NN search with pre-filtering or an approximate search with modified post-filtering. The value of the field must be a JSON representation of the filter. | |
Pre Filter | Pre Filter | {"match_all": {}} | Script Score query to pre-filter documents before identifying nearest neighbors. The value of the field must be a JSON representation of the filter. |
Embedding Model | Embedding Model | Specifies which embedding model should be used in order to create embeddings from incoming Documents. Default model is OpenAI. | |
OpenAI Model | OpenAI Model | text-embedding-ada-002 | The name of the OpenAI model to use |
HuggingFace Model | HuggingFace Model | sentence-transformers/all-MiniLM-L6-v2 | The name of the HuggingFace model to use |
HuggingFace API Key | HuggingFace API Key | The API Key for interacting with HuggingFace Sensitive Property: true | |
OpenAI API Key | OpenAI API Key | The API Key for OpenAI in order to create embeddings Sensitive Property: true | |
HTTP Host | HTTP Host | http://localhost:9200 | URL where OpenSearch is hosted. |
Username | Username | The username to use for authenticating to OpenSearch server | |
Password | Password | The password to use for authenticating to OpenSearch server Sensitive Property: true | |
Index Name | Index Name | The name of the OpenSearch index. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) | |
Vector Field Name | Vector Field Name | vector_field | The name of field in the document where the embeddings are stored. This field need to be a 'knn_vector' typed field. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) |
Text Field Name | Text Field Name | text | The name of field in the document where the text is stored. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables) |
Include Distances | Include Distances | true | Whether or not to include the Documents' Distances (i.e., how far the Document was away from the query) in the response |
Include Metadata | Include Metadata | true | Whether or not to include the Documents' Metadata in the response |
Results Field | Results Field | If the input FlowFile is JSON Formatted, this represents the name of the field to insert the results. This allows the results to be inserted into "an existing input in order to enrich it. If this property is unset, the results will be written to the FlowFile contents, overwriting any pre-existing content. |