PutChroma

Description:

Publishes JSON data to a Chroma VectorDB. The Incoming data must be in single JSON per Line format, each with two keys: 'text' and 'metadata'. The text must be a string, while metadata must be a map with strings for values. Any additional fields will be ignored. If the collection name specified does not exist, the Processor will automatically create the collection.

Tags:

chroma, vector, vectordb, embeddings, ai, artificial intelligence, ml, machine learning, text, LLM

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display Name	API Name	Default Value	Description
Store Document Text	Store Document Text	true	Specifies whether or not the text of the document should be stored in Chroma. If so, both the document's text and its embedding will be stored. If not, only the vector/embedding will be stored.
Distance Method	Distance Method	cosine	If the specified collection does not exist, it will be created using this Distance Method. If the collection exists, this property will be ignored.
Document ID Field Name	Document ID Field Name		Specifies the name of the field in the 'metadata' element of each document where the document's ID can be found. If not specified, an ID will be generated based on the FlowFile's filename and a one-up number. Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Connection Strategy	Connection Strategy	Remote Chroma Server	Specifies how to connect to the Chroma server
Directory	Directory	./chroma	The Directory that Chroma should use to persist data
Hostname	Hostname	localhost	The hostname to connect to in order to communicate with Chroma
Port	Port	8000	The port that the Chroma server is listening on
Transport Protocol	Transport Protocol	https	Specifies whether connections should be made over http or https
Authentication Strategy	Authentication Strategy	Token Authentication	Specifies how to authenticate to Chroma server
Authentication Token	Authentication Token		The token to use for authenticating to Chroma server Sensitive Property: true
Username	Username		The username to use for authenticating to Chroma server
Password	Password		The password to use for authenticating to Chroma server Sensitive Property: true
Collection Name	Collection Name	nifi	The name of the Chroma Collection Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Embedding Function	Embedding Function	ONNX all-MiniLM-L6-v2 Model	Specifies which embedding function should be used in order to create embeddings from incoming Documents
HuggingFace Model Name	HuggingFace Model Name	sentence-transformers/all-MiniLM-L6-v2	The name of the HuggingFace model to use
HuggingFace API Key	HuggingFace API Key		The API Key for interacting with HuggingFace Sensitive Property: true
OpenAI API Key	OpenAI API Key		The API Key for interacting with OpenAI Sensitive Property: true
OpenAI Model Name	OpenAI Model Name	text-embedding-ada-002	The name of the OpenAI model to use
OpenAI Organization ID	OpenAI Organization ID		The OpenAI Organization ID
OpenAI API Base Path	OpenAI API Base Path		The API Base to use for interacting with OpenAI. This is used for interacting with different deployments, such as an Azure deployment.
OpenAI API Deployment Type	OpenAI API Deployment Type		The type of the OpenAI API Deployment. This is used for interacting with different deployments, such as an Azure deployment.
OpenAI API Version	OpenAI API Version		The OpenAI API Version. This is used for interacting with different deployments, such as an Azure deployment.
Sentence Transformer Model Name	Sentence Transformer Model Name	all-MiniLM-L6-v2	The name of the Sentence Transformer model to use
Sentence Transformer Device Type	Sentence Transformer Device Type		The type of device to use for performing the embeddings using the Sentence Transformer, such as 'cpu', 'cuda', 'mps', 'cuda:0', etc. If not specified, a GPU will be used if possible, otherwise a CPU.