Deep-Dive Analysis Tools

The dashboard is designed for granular troubleshooting beyond the surface-level score

Understanding the Score

Most metrics provide a score from 0.00 to 1.00, where 1.00 represents perfect performance or success.

Click the caret (>) next to any metric to reveal the Explanation block. This provides a natural language description of why the score was given.

  • Example: If Faithfulness fails, the explanation might note: "The output appears to be a fragment... indicating a truncated error or fundamental mismatch with the context".

Metadata and Trace IDs

Expanding a metric also reveals raw metadata in JSON format. This includes:

  • Trace ID/Task ID: Unique identifiers for the specific execution step.
  • Input Prompts: The exact text sent to the LLM that resulted in the scored output.
  • Detailed Explanations: The judge's full reasoning for the success or failure status.