PutOpenSearchVector

Description:

Publishes JSON data to OpenSearch. The Incoming data must be in single JSON per Line format, each with two keys: 'text' and 'metadata'. The text must be a string, while metadata must be a map with strings for values. Any additional fields will be ignored.

Tags:

opensearch, vector, vectordb, vectorstore, embeddings, ai, artificial intelligence, ml, machine learning, text, LLM

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueDescription
Document ID Field NameDocument ID Field NameSpecifies the name of the field in the 'metadata' element of each document where the document's ID can be found. If not specified, an ID will be generated based on the FlowFile's filename and a one-up number.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
New Index StrategyNew Index StrategyDefault index mappingSpecifies the Mapping strategy to use for new index creation. The default template values are the following: {engine: nmslib, space_type: l2, ef_search: 512, ef_construction: 512, m: 16}
EngineEngineThe approximate k-NN library to use for indexing and search.
NMSLIB Space TypeNMSLIB Space TypeThe vector space used to calculate the distance between vectors.
FAISS Space TypeFAISS Space TypeThe vector space used to calculate the distance between vectors.
Lucene Space TypeLucene Space TypeThe vector space used to calculate the distance between vectors.
EF SearchEF Search512The size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches.
EF ConstructionEF Construction512The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph but slower indexing speed.
MM16The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100.
Embedding ModelEmbedding ModelSpecifies which embedding model should be used in order to create embeddings from incoming Documents. Default model is OpenAI.
OpenAI ModelOpenAI Modeltext-embedding-ada-002The name of the OpenAI model to use
HuggingFace ModelHuggingFace Modelsentence-transformers/all-MiniLM-L6-v2The name of the HuggingFace model to use
HuggingFace API KeyHuggingFace API KeyThe API Key for interacting with HuggingFace
Sensitive Property: true
OpenAI API KeyOpenAI API KeyThe API Key for OpenAI in order to create embeddings
Sensitive Property: true
HTTP HostHTTP Hosthttp://localhost:9200URL where OpenSearch is hosted.
UsernameUsernameThe username to use for authenticating to OpenSearch server
PasswordPasswordThe password to use for authenticating to OpenSearch server
Sensitive Property: true
Index NameIndex NameThe name of the OpenSearch index.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Vector Field NameVector Field Namevector_fieldThe name of field in the document where the embeddings are stored. This field need to be a 'knn_vector' typed field.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Text Field NameText Field NametextThe name of field in the document where the text is stored.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)

Example Use Cases:

Use Case:

Create vectors/embeddings that represent text content and send the vectors to OpenSearch

Notes:

This use case assumes that the data has already been formatted in JSONL format with the text to store in OpenSearch provided in the 'text' field.

Keywords:

opensearch, embedding, vector, text, vectorstore, insert

Configuration:

Configure the 'HTTP Host' to an appropriate URL where OpenSearch is accessible.

Configure 'Embedding Model' to indicate whether OpenAI embeddings should be used or a HuggingFace embedding model should be used: 'Hugging Face Model' or 'OpenAI Model'

Configure the 'OpenAI API Key' or 'HuggingFace API Key', depending on the chosen Embedding Model.

Set 'Index Name' to the name of your OpenSearch Index.

Set 'Vector Field Name' to the name of the field in the document which will store the vector data.

Set 'Text Field Name' to the name of the field in the document which will store the text data.

If the documents to send to OpenSearch contain a unique identifier, set the 'Document ID Field Name' property to the name of the field that contains the document ID.

This property can be left blank, in which case a unique ID will be generated based on the FlowFile's filename.

If the provided index does not exists in OpenSearch then the processor is capable to create it. The 'New Index Strategy' property defines

that the index needs to be created from the default template or it should be configured with custom values.



Use Case:

Update vectors/embeddings in OpenSearch

Notes:

This use case assumes that the data has already been formatted in JSONL format with the text to store in OpenSearch provided in the 'text' field.

Keywords:

opensearch, embedding, vector, text, vectorstore, update, upsert

Configuration:

Configure the 'HTTP Host' to an appropriate URL where OpenSearch is accessible.

Configure 'Embedding Model' to indicate whether OpenAI embeddings should be used or a HuggingFace embedding model should be used: 'Hugging Face Model' or 'OpenAI Model'

Configure the 'OpenAI API Key' or 'HuggingFace API Key', depending on the chosen Embedding Model.

Set 'Index Name' to the name of your OpenSearch Index.

Set 'Vector Field Name' to the name of the field in the document which will store the vector data.

Set 'Text Field Name' to the name of the field in the document which will store the text data.

Set the 'Document ID Field Name' property to the name of the field that contains the identifier of the document in OpenSearch to update.