What are Evaluators?
Evaluators are like test cases for your AI voice agents. Each evaluator simulates a conversation with your agent to systematically test its performance and behavior.Evaluator Components
An evaluator is composed of five key components:Instructions
Define how the evaluator behaves during conversations
Expected Outcome
The desired result that indicates a successful conversation
Metrics
Measurements like latency, relevancy, and consistency
Personality
Language, tone, and behavioral characteristics
Test Profile
Identity information like name, DOB, and address (optional)
How Evaluators Work
Instructions
Each evaluator has a set of instructions that define how it should behave during a simulation run. These instructions guide the evaluator’s conversation flow, what information to provide, and how to respond to your agent. During a simulation run, evaluators follow their instructions to engage in realistic conversations with your agent. In the transcript of a simulation run, dialogues said by evaluators are labeled as Testing Agents. Example Instructions:- “Call to cancel an appointment scheduled for next Tuesday”
- “Inquire about store hours and ask about product availability”
- “Request a refund for order #12345 and escalate if initially denied”
Expected Outcome
Evaluators have an expected outcome defined, which represents what should happen in a successful conversation. Once your evaluators complete a conversation with your agent, we evaluate those conversations to give you a report of how your agent performed. Expected outcome is one key metric we evaluate the conversation on. This metric tells whether your AI Agent (called Main Agent on Cekura) did what it was supposed to do in the conversation.Expected Outcome Examples
Scenario | Expected Outcome |
---|---|
Appointment Cancellation | The agent successfully cancels the appointment and provides a confirmation number |
Product Inquiry | The agent provides accurate product information and store hours |
Refund Request | The agent processes the refund request and provides a timeline for processing |
Account Verification | The agent verifies the customer’s identity and provides account information |
Metrics
Each evaluator also has a set of metrics attached to it. These metrics are computed alongside the expected outcome and provide deeper insights into conversation quality, agent behavior, and user experience. You can find these metrics in the metrics section of your simulation results. Read more about metrics here. Common metrics include:- Latency
- Infrastructure Issues
- Relevancy
- Consistency
Personality
An evaluator has a personality attached to it, which determines the language of the evaluator and other behavioral characteristics such as:- Whether the conversation will have background noise
- Interruption patterns
- Speaking pace
- Emotional tone
Test Profiles
An evaluator can have a test profile attached to it. A test profile gives the evaluator an identity including information like:- Name
- Date of birth
- Address
- Phone number
- Other relevant personal information
When to Use Test Profiles
You would want to use a test profile if your AI agent needs the counterparty to provide specific information like name, date of birth, address, etc., to complete its task.Example: Clinic Receptionist
If you have a clinic receptionist agent that can help with cancelling appointments, it will likely need to verify the counterparty before proceeding with cancellation. Setup Process:- Create mock appointments in your system with specific test data (e.g., date of birth: January 1, 2000)
- Create a test profile with the same information (date of birth: January 1, 2000)
- Attach the test profile to your evaluator