ReadyFlow overview: S3 to Pinecone [Technical Preview]

You can use the S3 to Pinecone [Technical Preview] ReadyFlow to consume PDF documents from S3, vectorize them using an OpenAI model and write the results to Pinecone.

This ReadyFlow consumes PDF documents from a source S3 location, partitions the PDFs, chunks the data, and transforms/splits the data into the required input format for the PutPinecone processor. The PutPinecone processor then vectorizes the data and writes the embeddings to a destination Pinecone index. The default OpenAI model is 'text-embedding-ada-002'. An OpenAI API key and a Pinecone API key are required to run this flow. Define a KPI on the failure_WriteToPinecone connection to monitor failed write operations.


S3 to Pinecone [Technical Preview] ReadyFlow details
Source	Cloudera managed Amazon S3
Source Format	PDF
Destination	Pinecone
Destination Format	Vector DB