Deep-Dive Analysis Tools
The dashboard is designed for granular troubleshooting beyond the surface-level score
Understanding the Score
Most metrics provide a score from 0.00 to 1.00, where 1.00 represents perfect performance or success.
Click the caret (>) next to any metric to reveal the Explanation block. This provides a natural language description of why the score was given.
- Example: If Faithfulness fails, the explanation might note: "The output appears to be a fragment... indicating a truncated error or fundamental mismatch with the context".
Metadata and Trace IDs
Expanding a metric also reveals raw metadata in JSON format. This includes:
- Trace ID/Task ID: Unique identifiers for the specific execution step.
- Input Prompts: The exact text sent to the LLM that resulted in the scored output.
- Detailed Explanations: The judge's full reasoning for the success or failure status.
