Skip to main content

What are Evaluators?

Evaluators are like test cases for your AI voice agents. Each evaluator simulates a conversation with your agent to systematically test its performance and behavior.

Evaluator Components

An evaluator is composed of five key components:

How Evaluators Work

Instructions

Each evaluator has a set of instructions that define how it should behave during a simulation run. These instructions guide the evaluator’s conversation flow, what information to provide, and how to respond to your agent. During a simulation run, evaluators follow their instructions to engage in realistic conversations with your agent. In the transcript of a simulation run, dialogues said by evaluators are labeled as Testing Agents. Example Instructions:
  • “Call to cancel an appointment scheduled for next Tuesday”
  • “Inquire about store hours and ask about product availability”
  • “Request a refund for order #12345 and escalate if initially denied”
See more instruction examples.

Expected Outcome

Evaluators have an expected outcome defined, which represents what should happen in a successful conversation. Once your evaluators complete a conversation with your agent, we evaluate those conversations to give you a report of how your agent performed. Expected outcome is one key metric we evaluate the conversation on. This metric tells whether your AI Agent (called Main Agent on Cekura) did what it was supposed to do in the conversation.

Expected Outcome Examples

ScenarioExpected Outcome
Appointment CancellationThe agent successfully cancels the appointment and provides a confirmation number
Product InquiryThe agent provides accurate product information and store hours
Refund RequestThe agent processes the refund request and provides a timeline for processing
Account VerificationThe agent verifies the customer’s identity and provides account information
The expected outcome is evaluated as either met or not met, giving you clear visibility into whether your agent is performing as intended. See more expected outcome examples.

Metrics

Each evaluator also has a set of metrics attached to it. These metrics are computed alongside the expected outcome and provide deeper insights into conversation quality, agent behavior, and user experience. You can find these metrics in the metrics section of your simulation results. Read more about metrics here. Common metrics include:
  • Latency
  • Infrastructure Issues
  • Relevancy
  • Consistency

Personality

An evaluator has a personality attached to it, which determines the language of the evaluator and other behavioral characteristics such as:
  • Whether the conversation will have background noise
  • Interruption patterns
  • Speaking pace
  • Emotional tone
Personalities help you test your agent against different types of users and real-world conditions. Read more about personalities here.

Test Profiles

An evaluator can have a test profile attached to it. A test profile gives the evaluator an identity including information like:
  • Name
  • Date of birth
  • Address
  • Phone number
  • Other relevant personal information

When to Use Test Profiles

You would want to use a test profile if your AI agent needs the counterparty to provide specific information like name, date of birth, address, etc., to complete its task.

Example: Clinic Receptionist

If you have a clinic receptionist agent that can help with cancelling appointments, it will likely need to verify the counterparty before proceeding with cancellation. Setup Process:
  1. Create mock appointments in your system with specific test data (e.g., date of birth: January 1, 2000)
  2. Create a test profile with the same information (date of birth: January 1, 2000)
  3. Attach the test profile to your evaluator
How It Works: When the evaluator (acting as a test agent) holds a conversation with your agent, it will provide the date of birth from its test profile. Your AI agent can then use this information to look up the desired appointment and proceed with the cancellation. This ensures consistent, reliable testing of verification flows and identity-dependent features in your agent. Read more about test profiles here.