Metrics Reference Glossary
| Metric Category | Metric Name | Detailed Definition |
|---|---|---|
| Automatic | Latency | Total time in seconds from workflow start to finish. Useful for identifying performance bottlenecks. |
| Automatic | Token Usage | Cost analysis measuring total prompt vs. completion tokens used across all execution spans. |
| Automatic | Error Rate | The percentage of spans that returned an error status rather than "Success". |
| Automatic | Loop Detection | Flags repetitive LLM calls that suggest an agent is "stuck" or stuck in a recursive loop. |
| Automatic | Task Completion | Binary check (Success/Fail) based strictly on the execution status code of the final task. |
| Automatic | Tool Call Count | Sum total of all tool or sub-agent calls made during the workflow trace. |
| Metric Category | Metric Name | Detailed Definition |
|---|---|---|
| LLM Judge | Faithfulness | Evaluates if the answer is grounded factually in the context provided. Often used to detect hallucinations. |
| LLM Judge | Reasoning Quality | Assesses if the agent followed a logical chain of thought and appropriate specialist routing. |
| LLM Judge | Toxicity | Safety check for harmful, offensive, hateful, or inappropriate language in the agent's response. |
| LLM Judge | Manager Delegation | Measures if the Manager agent chose the correct specialist agent based on the user's specific query. |
| LLM Judge | Tool Calling Accuracy | Measures if the correct tool was selected and if the generated parameters were valid for that tool. |
