QueryOpenSearchVector

Description:

Queries OpenSearch in order to gather a specified number of documents that are most closely related to the given query.

Tags:

opensearch, vector, vectordb, vectorstore, embeddings, ai, artificial intelligence, ml, machine learning, text, LLM

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueDescription
Output StrategyOutput StrategyRow-OrientedSpecifies whether the output should contain only the text of the documents (each document separated by \n\n), or if it should be formatted as either single column-oriented JSON object, consisting of a keys 'ids', 'embeddings', 'documents', 'distances', and 'metadatas'; or if the results should be row-oriented, a JSON per line, each consisting of a single id, document, metadata, embedding, and distance.
QueryQueryThe text of the query to send to OpenSearch.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Number of ResultsNumber of Results10The number of results to return from OpenSearch
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Search TypeSearch TypeSpecifies the type of the search to be performed.
Script Scoring Space TypeScript Scoring Space TypeUsed to measure the distance between two points in order to determine the k-nearest neighbors.
Painless Scripting Space TypePainless Scripting Space TypeUsed to measure the distance between two points in order to determine the k-nearest neighbors.
Boolean FilterBoolean FilterA Boolean filter is a post filter consists of a Boolean query that contains a k-NN query and a filter. The value of the field must be a JSON representation of the filter.
Efficient FilterEfficient FilterThe Lucene Engine or Faiss Engine decides whether to perform an exact k-NN search with pre-filtering or an approximate search with modified post-filtering. The value of the field must be a JSON representation of the filter.
Pre FilterPre Filter{"match_all": {}}Script Score query to pre-filter documents before identifying nearest neighbors. The value of the field must be a JSON representation of the filter.
Embedding ModelEmbedding ModelSpecifies which embedding model should be used in order to create embeddings from incoming Documents. Default model is OpenAI.
OpenAI ModelOpenAI Modeltext-embedding-ada-002The name of the OpenAI model to use
HuggingFace ModelHuggingFace Modelsentence-transformers/all-MiniLM-L6-v2The name of the HuggingFace model to use
HuggingFace API KeyHuggingFace API KeyThe API Key for interacting with HuggingFace
Sensitive Property: true
OpenAI API KeyOpenAI API KeyThe API Key for OpenAI in order to create embeddings
Sensitive Property: true
HTTP HostHTTP Hosthttp://localhost:9200URL where OpenSearch is hosted.
UsernameUsernameThe username to use for authenticating to OpenSearch server
PasswordPasswordThe password to use for authenticating to OpenSearch server
Sensitive Property: true
Index NameIndex NameThe name of the OpenSearch index.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Vector Field NameVector Field Namevector_fieldThe name of field in the document where the embeddings are stored. This field need to be a 'knn_vector' typed field.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Text Field NameText Field NametextThe name of field in the document where the text is stored.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Include DistancesInclude DistancestrueWhether or not to include the Documents' Distances (i.e., how far the Document was away from the query) in the response
Include MetadataInclude MetadatatrueWhether or not to include the Documents' Metadata in the response
Results FieldResults FieldIf the input FlowFile is JSON Formatted, this represents the name of the field to insert the results. This allows the results to be inserted into "an existing input in order to enrich it. If this property is unset, the results will be written to the FlowFile contents, overwriting any pre-existing content.