PutChroma

Description:

Publishes JSON data to a Chroma VectorDB. The Incoming data must be in single JSON per Line format, each with two keys: 'text' and 'metadata'. The text must be a string, while metadata must be a map with strings for values. Any additional fields will be ignored. If the collection name specified does not exist, the Processor will automatically create the collection.

Tags:

chroma, vector, vectordb, embeddings, ai, artificial intelligence, ml, machine learning, text, LLM

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueDescription
Store Document TextStore Document TexttrueSpecifies whether or not the text of the document should be stored in Chroma. If so, both the document's text and its embedding will be stored. If not, only the vector/embedding will be stored.
Distance MethodDistance MethodcosineIf the specified collection does not exist, it will be created using this Distance Method. If the collection exists, this property will be ignored.
Document ID Field NameDocument ID Field NameSpecifies the name of the field in the 'metadata' element of each document where the document's ID can be found. If not specified, an ID will be generated based on the FlowFile's filename and a one-up number.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Connection StrategyConnection StrategyRemote Chroma ServerSpecifies how to connect to the Chroma server
DirectoryDirectory./chromaThe Directory that Chroma should use to persist data
HostnameHostnamelocalhostThe hostname to connect to in order to communicate with Chroma
PortPort8000The port that the Chroma server is listening on
Transport ProtocolTransport ProtocolhttpsSpecifies whether connections should be made over http or https
Authentication StrategyAuthentication StrategyToken AuthenticationSpecifies how to authenticate to Chroma server
Authentication TokenAuthentication TokenThe token to use for authenticating to Chroma server
Sensitive Property: true
UsernameUsernameThe username to use for authenticating to Chroma server
PasswordPasswordThe password to use for authenticating to Chroma server
Sensitive Property: true
Collection NameCollection NamenifiThe name of the Chroma Collection
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Embedding FunctionEmbedding FunctionONNX all-MiniLM-L6-v2 ModelSpecifies which embedding function should be used in order to create embeddings from incoming Documents
HuggingFace Model NameHuggingFace Model Namesentence-transformers/all-MiniLM-L6-v2The name of the HuggingFace model to use
HuggingFace API KeyHuggingFace API KeyThe API Key for interacting with HuggingFace
Sensitive Property: true
OpenAI API KeyOpenAI API KeyThe API Key for interacting with OpenAI
Sensitive Property: true
OpenAI Model NameOpenAI Model Nametext-embedding-ada-002The name of the OpenAI model to use
OpenAI Organization IDOpenAI Organization IDThe OpenAI Organization ID
OpenAI API Base PathOpenAI API Base PathThe API Base to use for interacting with OpenAI. This is used for interacting with different deployments, such as an Azure deployment.
OpenAI API Deployment TypeOpenAI API Deployment TypeThe type of the OpenAI API Deployment. This is used for interacting with different deployments, such as an Azure deployment.
OpenAI API VersionOpenAI API VersionThe OpenAI API Version. This is used for interacting with different deployments, such as an Azure deployment.
Sentence Transformer Model NameSentence Transformer Model Nameall-MiniLM-L6-v2The name of the Sentence Transformer model to use
Sentence Transformer Device TypeSentence Transformer Device TypeThe type of device to use for performing the embeddings using the Sentence Transformer, such as 'cpu', 'cuda', 'mps', 'cuda:0', etc. If not specified, a GPU will be used if possible, otherwise a CPU.