Suggested Testing Approach

Overview

Building a robust testing suite for your AI voice agent doesn’t have to be overwhelming. We’ve found that many successful users follow a simple, iterative approach that yields high-quality test cases in just one cycle.

The Recommended Workflow

A workflow that allows many of our users to derive significant value:

Generate 10 test cases

Start by creating 10 diverse test cases that cover different scenarios your agent might encounter.What to include:

Common user requests (happy path scenarios)
Edge cases (unusual but valid requests)
Error conditions (invalid inputs, missing information)
Different user personalities and communication styles

Use Cekura’s AI-powered scenario generation to quickly create varied test cases based on your agent’s purpose.

Run them

Execute all 10 test cases against your agent.During the run:

Let each conversation complete naturally

This gives you a baseline understanding of how your agent performs across different scenarios.

Review the failed calls

Analyze the conversations where your agent didn’t meet the expected outcome.What to look for:

Why did the conversation fail?
Did the agent misunderstand the request?
Was information missing or incorrect?
Did the agent handle edge cases poorly?
Were there technical issues (latency, interruptions)?

If a call is marked as failure but you believe it should be successful, check these two things:

Is the expected outcome prompt correct and clear?
- If not: Edit the expected outcome prompt directly from inside the run
- Re-evaluate the call until it passes
- Hit Save to update the evaluator with the corrected expected outcome
Did the testing agent follow the instructions provided?
- If not: Review our Evaluator Instructions Guide for best practices
- Still having issues? Reach out to us at [email protected]

Why This Works

After just one iteration of this exercise, you will have 10 very good test cases you can always rely on. Here’s what makes this approach effective:

1. Real-World Validation

Your test cases are validated against actual agent behavior, not theoretical scenarios. You know exactly how your agent responds. Failed calls help you:

Refine your agent’s prompts and logic
Identify missing features or capabilities
Improve error handling
Adjust expected outcomes to be more realistic

3. Regression Testing Foundation

Once refined, these 10 test cases become your regression test suite. Run them after every agent update to ensure you haven’t broken existing functionality.

4. Iterative Improvement

Each cycle of this workflow compounds your testing quality:

Cycle 1: Establish baseline, fix obvious issues
Cycle 2: Handle edge cases better
Cycle 3: Optimize performance and user experience

Expanding Your Test Suite

After your initial 10 test cases are solid, you can expand strategically:

Add Personality Variations

Test the same scenarios with different personalities (patient, frustrated, background noise)

Cover More Scenarios

Generate additional test cases for less common but important use cases

Test Profile Variations

Use different test profiles to validate identity verification flows

Stress Testing

Add load testing to ensure your agent performs under high traffic

Best Practices

Start Simple

Don’t try to cover every possible scenario on day one. Start with 10 good test cases and build from there.

Be Specific with Expected Outcomes

Vague expected outcomes make it hard to evaluate success. Instead of “Agent handles the request well,” use “Agent cancels the appointment and provides confirmation number.”

Use Realistic Instructions

Your evaluator instructions should mimic how real users would interact with your agent. Avoid overly scripted or robotic instructions.

Review Passed Calls Too

Don’t only focus on failures. Review successful calls to understand what your agent does well and ensure the success wasn’t accidental.

Maintain Your Test Suite

As your agent evolves, update your test cases and expected outcomes to reflect new capabilities and requirements.

Example: Building Your First 10 Test Cases

Let’s say you’re testing a restaurant reservation AI agent. Here’s a balanced set of 10 test cases:

#	Scenario Type	Description
1	Happy Path	Make a reservation for 2 people tonight at 7 PM
2	Happy Path	Make a reservation for 4 people next Friday at 6:30 PM
3	Date Clarification	”I want to book a table for Saturday” (this Saturday or next?)
4	Time Unavailable	Request a time slot that’s fully booked
5	Modification	Change an existing reservation time
6	Cancellation	Cancel an existing reservation
7	Information Request	Ask about menu options or special dietary accommodations
8	Large Party	Request reservation for 10+ people
9	Interrupted User	User with background noise and interruptions
10	Non-Native Speaker	User with slower pace and accent

This mix covers:

40% standard scenarios (1, 2, 5, 6)
30% clarification and error handling (3, 4, 7)
20% edge cases (8)
10% challenging conditions (9, 10)

Measuring Success

After running your workflow, you should aim for:

70-80% pass rate on first run (realistic baseline)
90-95% pass rate after refining based on failures
95%+ pass rate as your long-term regression suite

Don’t aim for 100%: Real-world conversations are unpredictable. Some variability is normal and healthy. Focus on consistency in core functionality.

Next Steps

Once you have your reliable 10 test cases:

Schedule Regular Runs: Set up cron jobs to run your tests automatically
Monitor Metrics: Track performance over time using metrics
Iterate on Failures: Continuously refine your agent based on test results
Expand Coverage: Gradually add more test cases for comprehensive coverage

Get Started

Key Concepts

Guides

Integrations

Advanced

Suggested Testing Approach

Overview

The Recommended Workflow

Why This Works

1. Real-World Validation

2. Failure-Driven Refinement

3. Regression Testing Foundation

4. Iterative Improvement

Expanding Your Test Suite

Add Personality Variations

Cover More Scenarios

Test Profile Variations

Stress Testing

Best Practices

Start Simple

Be Specific with Expected Outcomes

Use Realistic Instructions

Review Passed Calls Too

Maintain Your Test Suite

Example: Building Your First 10 Test Cases

Measuring Success

Next Steps

Get Started

Key Concepts

Guides

Integrations

Advanced

​Overview

​The Recommended Workflow

​Why This Works

​1. Real-World Validation

​2. Failure-Driven Refinement

​3. Regression Testing Foundation

​4. Iterative Improvement

​Expanding Your Test Suite

Add Personality Variations

Cover More Scenarios

Test Profile Variations

Stress Testing

​Best Practices

​Start Simple

​Be Specific with Expected Outcomes

​Use Realistic Instructions

​Review Passed Calls Too

​Maintain Your Test Suite

​Example: Building Your First 10 Test Cases

​Measuring Success

​Next Steps

​Related Resources

Overview

The Recommended Workflow

Why This Works

1. Real-World Validation

2. Failure-Driven Refinement

3. Regression Testing Foundation

4. Iterative Improvement

Expanding Your Test Suite

Best Practices

Start Simple

Be Specific with Expected Outcomes

Use Realistic Instructions

Review Passed Calls Too

Maintain Your Test Suite

Example: Building Your First 10 Test Cases

Measuring Success

Next Steps

Related Resources