Using RAG Studio
Integrate with existing data infrastructure and workflows, while also maintain control over your data.
- Access the Studio Application UI, if the necessary permission is granted in the project settings
- Create chat interactions with the available large language models, where usres can ask questions and get answers from the model
- Ensure that a knowledge base is used to ground answers from the large language model with direct quotes and inputs sourced from the documents in the knowledge store
- Provide feedback on each answer which will be collated in the Analytics tab and stored in ML Flow for any model evaluation or training purposes
-
In the Cloudera
console, click the Cloudera AI
tile.
The Cloudera AI Workbenches page displays.
-
Click on the name of the workbench.
The workbenches Home page displays.
-
Click Projects, and then click New
Project to create a new project.
In the left navigation pane, the new AI Studios option is displayed.
- Click AI Studios and select RAG Studio to enter the application.
-
Start interacting with the chat function.
At this stage you can begin interacting with the RAG Studio chat function, however the chat is limited to using only knowledge from the model’s training as its source, as there is no knowledge base defined yet.
- Write your question into the chat field at the bottom of the page, send the question and wait for the chatbot’s response.
- Complete your question with the information you missed from the answer, or rephrase your original questions with more details added. Include more details to help the chatbot provide a more precise response.
- Click on the Suggested Questions drop-down list, located above the chat field and select one of the predefined questions.
-
Click Knowledge Bases and select the Create
Knowledge Base button.
The RAG Studio chat function, using the knowledge base, is a more advanced version of the chat function. It uses a knowledge base to access and retrieve information to provide more accurate and up-to-date answers to users. When you send a message, the chatbot searches the knowledge base to find relevant information and provide a response based on that information.
-
Fill in the required fields for the Knowledge Base.
- Name - the name of the Knowledge Base
- Chunk size - refers to the amount of data that is processed and written to the database in a single operation. Long documents are divided into smaller chunks for referencing, with the chunk size determining the size of these pieces. If unsure, it is recommended to keep the default value of 512.
- Embedding Model - The selected large language model that is used will “read” the provided documents and transform them into the vectors it needs to reference later in a process called embedding.
- Summarization model - to enable summary-based retrieval
- Advanced Options
- Distance metrics - Cosine
- Chunk overlap - This setting controls how much of the previous chunk's data is included in the next chunk, so as the small pieces of the larger document are referenced information at the boundaries between the pieces and are not lost
Once created and connected to a chat, the information in the knowledge base becomes the only knowledge the large language model is allowed to reference, grounding its answers in your enterprise context.
-
Fill the required Knowledge Base by uploading
documents.
Supported file types include:
- .txt, .md, .csv
- .pdf, .docx, .pptx, .pptm, .ppt
- .jpg, .jpeg, .png
- .jso
- If advanced document processing is enabled, then images and charts contained within PDFs will also be ingested
- To begin a RAG-enabled chat, click Chats in the top-left corner. The Chat with the LLM field is displayed.
- Select the Knowledge Base you would like to use for your chat, from the drop-down list, in the bottom-right corner of the Chat with the LLM field.
- Optionally, select the Inference model to be used.
-
Click the
icon.
The main Chat window with a chat field is displayed.
- Optional:
Configure Chat Settings if required.
You can configure the followings:
- Knowledge Base: Select the required Knowledge Base.
- Name: Provide a name for the chat.
- Response synthesizer model: Select the model that you would like to write the final answer.
- Reranking model: Select the model you want to decide what documents and snippets are the most important to reference. This feature is not available with OpenAI.
- Maximum number of documents: Select how many document chunks you want the answer to reference and incorporate. The number is set to 10 by default.
- Advanced options:
-
Enable Tool calling: Enable or disable tool calling for each session and select the allowed tools using the Tools Manager. The platform automatically verifies whether the selected model supports tool calling.
-
Enable HyDE: Enable Hypothetical Document Embeddings (HyDE) during retrieval to enhance retrieval performance.
-
Enable Summary filtering: Utilize summaries to filter out retrieved chunks.
-
Disable streaming
-
- Write your question into the chat field, send it and wait for a reply.
-
Check the answer you received from RAG studio.
You can also spot check RAG Studio’s automatic evaluation of the answer based on the available text, the Knowledge Base:
: marks the level of
relevancy, measures if the response and source nodes
match the query. Does the question/answer pair make sense?
: marks faithfulness,
measures if the response from a query engine matches any source
nodes. Does the provided answer match the source documents
well?
-
Evaluate the answer with the help of the
icons.
Anyone using the chat can optionally provide feedback on each answer, which can be used to systematically evaluate the performance of the chatbot.
-
View the evaluations, summary and analytics by selecting
Analytics in the top navigation bar.
The available analytics are:
- App Metrics - It provides metrics on the overall Studio application deployed.
- Session Metrics
- Inference Metrics
- Feedback Metrics
- Auto evaluation metric averages
- Chunk relevance over time
All the underlying data is also stored in the local MLFlow instance for machine learning scientists to use as needed.
