Overview

What are Evaluators?

Evaluators are like test cases for your AI voice agents. Each evaluator simulates a conversation with your agent to systematically test its performance and behavior.

Evaluator Components

An evaluator is composed of five key components:

Instructions

Define how the evaluator behaves during conversations

Expected Outcome

The desired result that indicates a successful conversation

Metrics

Measurements like latency, relevancy, and consistency

Personality

Language, tone, and behavioral characteristics

Test Profile

Identity information like name, DOB, and address (optional)

How Evaluators Work

Instructions

Each evaluator has a set of instructions that define how it should behave during a simulation run. These instructions guide the evaluator’s conversation flow, what information to provide, and how to respond to your agent. During a simulation run, evaluators follow their instructions to engage in realistic conversations with your agent. In the transcript of a simulation run, dialogues said by evaluators are labeled as Testing Agents. Example Instructions:

“Call to cancel an appointment scheduled for next Tuesday”
“Inquire about store hours and ask about product availability”
“Request a refund for order #12345 and escalate if initially denied”

See more instruction examples.

Expected Outcome

Evaluators have an expected outcome defined, which represents what should happen in a successful conversation. Once your evaluators complete a conversation with your agent, we evaluate those conversations to give you a report of how your agent performed. Expected outcome is one key metric we evaluate the conversation on. This metric tells whether your AI Agent (called Main Agent on Cekura) did what it was supposed to do in the conversation.

Expected Outcome Examples

Scenario	Expected Outcome
Appointment Cancellation	The agent successfully cancels the appointment and provides a confirmation number
Product Inquiry	The agent provides accurate product information and store hours
Refund Request	The agent processes the refund request and provides a timeline for processing
Account Verification	The agent verifies the customer’s identity and provides account information

The expected outcome is evaluated as either met or not met, giving you clear visibility into whether your agent is performing as intended. See more expected outcome examples.

Metrics

Each evaluator also has a set of metrics attached to it. These metrics are computed alongside the expected outcome and provide deeper insights into conversation quality, agent behavior, and user experience. You can find these metrics in the metrics section of your simulation results. Read more about metrics here. Common metrics include:

Latency
Infrastructure Issues
Relevancy
Consistency

Personality

An evaluator has a personality attached to it, which determines the language of the evaluator and other behavioral characteristics such as:

Whether the conversation will have background noise
Interruption patterns
Speaking pace
Emotional tone

Personalities help you test your agent against different types of users and real-world conditions. Read more about personalities here.

Test Profiles

An evaluator can have a test profile attached to it. A test profile gives the evaluator an identity including information like:

Name
Date of birth
Address
Phone number
Other relevant personal information

When to Use Test Profiles

You would want to use a test profile if your AI agent needs the counterparty to provide specific information like name, date of birth, address, etc., to complete its task.

Example: Clinic Receptionist

If you have a clinic receptionist agent that can help with cancelling appointments, it will likely need to verify the counterparty before proceeding with cancellation. Setup Process:

Create mock appointments in your system with specific test data (e.g., date of birth: January 1, 2000)
Create a test profile with the same information (date of birth: January 1, 2000)
Attach the test profile to your evaluator

How It Works: When the evaluator (acting as a test agent) holds a conversation with your agent, it will provide the date of birth from its test profile. Your AI agent can then use this information to look up the desired appointment and proceed with the cancellation. This ensures consistent, reliable testing of verification flows and identity-dependent features in your agent. Read more about test profiles here.

Get Started

Key Concepts

Guides

Integrations

Advanced

What are Evaluators?

Evaluator Components

Instructions

Expected Outcome

Metrics

Personality

Test Profile

How Evaluators Work

Instructions

Expected Outcome

Expected Outcome Examples

Metrics

Personality

Test Profiles

When to Use Test Profiles

Example: Clinic Receptionist

Get Started

Key Concepts

Guides

Integrations

Advanced

​What are Evaluators?

​Evaluator Components

Instructions

Expected Outcome

Metrics

Personality

Test Profile

​How Evaluators Work

​Instructions

​Expected Outcome

​Expected Outcome Examples

​Metrics

​Personality

​Test Profiles

​When to Use Test Profiles

​Example: Clinic Receptionist

What are Evaluators?

Evaluator Components

How Evaluators Work

Instructions

Expected Outcome

Expected Outcome Examples

Metrics

Personality

Test Profiles

When to Use Test Profiles

Example: Clinic Receptionist