Accuracy Metrics
These metrics evaluate whether your agent provides correct and consistent information.Expected Outcome
Expected Outcome
expected_outcome_prompt in your evaluator configuration describing what success looks like.Interpretation:- Pass: Main Agent achieved the expected outcome
- Review Required: Outcome unclear, manual review recommended
- Failed: Main Agent did not achieve the expected outcome
Hallucination
Hallucination
- True: No hallucinations detected (Main Agent stayed factual)
- False: Main Agent provided unsupported or contradictory information
Relevancy
Relevancy
- True: Responses were relevant and on-topic
- False: Main Agent gave off-topic or inappropriate responses
Response Consistency
Response Consistency
- Testing Agent provides information (e.g., their name) and the Main Agent repeats it back incorrectly
- Main Agent makes contradictory statements (e.g., says one thing early in the call, then contradicts it later)
- True: Main Agent maintained consistent information throughout
- False: Inconsistencies or contradictions detected
Tool Call Success
Tool Call Success
- True: All tool calls succeeded
- False: One or more tool calls returned an error
Transcription Accuracy
Transcription Accuracy
Voicemail Detection (Beta)
Voicemail Detection (Beta)
- True: Call reached voicemail
- False: Call connected to a live person
Conversation Quality Metrics
These metrics evaluate the flow and dynamics of the conversation.AI Interrupting User
AI Interrupting User
Stop Time After User Interruption (ms)
Stop Time After User Interruption (ms)
User Interrupting AI
User Interrupting AI
Latency (in ms)
Latency (in ms)
Unnecessary Repetition Count
Unnecessary Repetition Count
Verbosity
Verbosity
- 5: Consistently concise and well-calibrated to user intent
- 4: Mostly concise; a few overlong turns
- 3: Mixed; several turns are too long
- 2: Verbosity is the dominant issue across most turns
- 1: Agent is bloated throughout
Detect Silence in Conversation
Detect Silence in Conversation
silence_duration in the metric configuration (default: 10 seconds).Interpretation:- True: No problematic silence detected
- False: Extended mutual silence exceeding the threshold was detected
Infrastructure Issues
Infrastructure Issues
infra_issues_timeout in the metric configuration (default: 10 seconds).Interpretation:- True: No infrastructure issues detected
- False: Main Agent failed to respond within the timeout after the Testing Agent finished speaking
Appropriate Termination by Main Agent
Appropriate Termination by Main Agent
- True: Call was ended appropriately by the Main Agent
- False: Main Agent ended call abruptly or inappropriately
Appropriate Termination by Testing Agent
Appropriate Termination by Testing Agent
- True: Call ended at a natural conclusion point
- False: Testing Agent ended call early, suggesting dissatisfaction
Customer Experience Metrics
These metrics evaluate the Testing Agent’s experience and satisfaction with the conversation.CSAT
CSAT
- Positive: Clear expressions of gratitude like “Thank you so much for your help” = 5 points
- Neutral: Simple thanks, cooperative tone, matter-of-fact responses = 5 points
- Negative: Explicit frustration, harsh language, complaints = 1 point
- Fully cooperative / No issues = 5 points
- Somewhat uncooperative = 3 points
- Refused to help / Obstructed = 1 point
Dropoff Node
Dropoff Node
dropoff_nodes on your agent with the conversation stages you want to track (e.g., “greeting”, “information_gathering”, “resolution”, “closing”).Interpretation: Helps identify where in your conversation flow Testing Agents are dropping off, enabling targeted improvements.Sentiment
Sentiment
- Positive: Only when the Testing Agent is clearly very grateful with phrases like “Thank you so much for your help”, “I really appreciate this”, “You’ve been so helpful”
- Negative: Explicit frustration, harsh language, complaints, or aggressive tone
- Neutral: Simple “thanks”, cooperative tone, matter-of-fact responses, or when sentiment is unclear
- Positive: Testing Agent seemed very satisfied or grateful
- Neutral: Testing Agent showed no strong emotion
- Negative: Testing Agent seemed frustrated or dissatisfied
Topic of Call
Topic of Call
topic_nodes on your agent with the topics you want to track (e.g., “billing”, “technical_support”, “sales”, “general_inquiry”).Interpretation: Helps understand call volume distribution across different topics for resource planning and analysis.Speech Quality Metrics
These metrics evaluate the audio and speech characteristics of the Main Agent.Average Pitch (in Hz)
Average Pitch (in Hz)
Gibberish Detection (Beta)
Gibberish Detection (Beta)
- True: Speech was clear and intelligible
- False: Gibberish or garbled speech detected
Letterwise Pronunciation
Letterwise Pronunciation
spelling_word_types on your agent specifying which types of words should be spelled out (e.g., “name”, “email”, “confirmation_code”).Interpretation:- True: Every instance of every word of the selected category was correctly spelled out in the audio
- False: Spelling errors detected or words not spelled out when required
Pronunciation Check (Beta)
Pronunciation Check (Beta)
pronunciation_words on your agent as a list of word-phoneme pairs (e.g., [["Cekura", "suh-KYUR-uh"]]).Interpretation: Higher scores indicate better pronunciation accuracy. Useful for brand names or technical terms.Speaking Rate (Beta)
Speaking Rate (Beta)
- True: Speaking rate was consistent and natural
- False: Unnatural speaking rate changes detected
Talk Ratio
Talk Ratio
Voice Change Detection (Beta)
Voice Change Detection (Beta)
- True: Consistent speaker throughout Main Agent turns
- False: Unexpected voice change detected (may indicate system issues)
Voice Tone + Clarity
Voice Tone + Clarity
Words Per Minute (WPM)
Words Per Minute (WPM)