Deep-Dive Analysis Tools
The dashboard is designed for granular troubleshooting beyond the surface-level scores.
Understanding the Score
Most metrics provide a score from 0.00 to 1.00, where 1.00 represents perfect performance or success.
Click the caret (>) next to any metric to reveal the Explanation block. This provides a natural language description of why the score was given.
- Explanation Block: Expanding a metric reveals a natural language description (e.g., if a "Faithfulness" check fails, the judge explains exactly why the output mismatched the context).
- Standardized Status: Evaluation results now use consistent PASS/FAIL labels to ensure clarity across different evaluators.
Metadata and Trace IDs
Expanding a metric also reveals raw metadata in JSON format. This includes:
- Trace ID/Task ID: Unique identifiers for the specific execution step.
- Input Prompts: The exact text sent to the LLM that resulted in the scored output.
- Detailed Explanations: The judge's full reasoning for the success or failure status.
