The Optimise Prompt button hands a set of failing evaluators to the Cekura AI Assistant, which runs a diagnose → propose → apply → re-validate loop against your agent’s prompt and configuration. The loop continues until the agent reaches 100% pass rate on the validation set or the iteration cap is reached.Documentation Index
Fetch the complete documentation index at: https://docs.cekura.ai/llms.txt
Use this file to discover all available pages before exploring further.
Optimise Prompt is BETA. The optimiser edits a live agent (VAPI assistants directly via the VAPI API, self-hosted websocket agents via source-file edits). Review proposed diffs before applying for the first few runs.
Where the button lives
Two surfaces in the dashboard expose the button. Both route into the same workflow.Evaluators list — bulk action
Open an agent’s Evaluators tab, select one or more evaluators (the scenario rows), and the Optimise Prompt button appears in the bulk-action bar next to Run and the delete control. The optimiser will use every selected evaluator as the validation set.
What the loop does
The AI Assistant runs the self-improving agent workflow. Each iteration walks the same phases:| Phase | What happens |
|---|---|
| Setup | Resolves the agent’s provider mode (VAPI or self-hosted), loads the system prompt and tool definitions. For self-hosted agents, asks once for the redeploy command so later iterations can restart the live process automatically. |
| Collect | Pulls the runs for the selected evaluators, filters out passing runs, and inspects each failed call’s provider state (transcript, tool calls, end-of-call attribution). |
| Early-End-Call Diagnose | Triages failures where the main agent ended the call before the scenario’s required steps completed — usually a closure-rule problem in the prompt. |
| Diagnose | Classifies remaining failures as Gap (missing instruction), Conflict (contradictory instructions), Ambiguity (under-specified instruction), CodeBug (orchestration code, self-hosted only), or Upstream (data / infra / out-of-scope). Proposes minimal scoped edits per failure and presents the combined diff. |
| Apply | Lands the approved edits. VAPI: PATCH the assistant / tools directly. Self-hosted websocket: edits the system-prompt and orchestration code in your source file, then runs your redeploy command. |
| Sync | Re-fetches the just-edited artifacts and verifies each changed field actually landed (catches VAPI nested-object replacement and ambiguous source-file anchors). |
| Overfitting Gate | Scrubs the just-applied edits for transcript-specific phrasing, scenario IDs, hardcoded test data, hyper-narrow case clauses, and transcript-cloned few-shot examples. Cleans up by revising or stripping the offending edit before validation. |
| Eval | Re-runs the failing evaluators against the new prompt. On 100% pass, runs a regression sweep across the original full set. Decides: hand back to Collect (failures remain), declare success, or surface a stop condition. |
- 100% pass on the full evaluator set (after a regression sweep) — success.
- Iteration cap reached (default 10).
- Oscillation (same scenario flipping pass/fail across iterations).
- No change (identical post-edit failures two iterations in a row).
- Three consecutive same-shape failures at the same edit surface — the assistant surfaces larger architectural options (model swap, programmatic guard, flow restructure) instead of producing another similar prompt edit.
- All-Upstream classification — every remaining failure is data / infra and not fixable from the prompt or config.
Supported agent modes
| Mode | Editable surfaces | How edits land |
|---|---|---|
| VAPI | System prompts on every squad member, tool function declarations, tool spoken messages (request-start / request-complete / request-failed), squad model.toolIds, handoff destinations. | VAPI PATCH — the live assistant picks up edits immediately. |
| Self-hosted (websocket) | The system prompt string constant in your source file, tool schemas, and orchestration code (conversation-history management, message wiring, state preservation, keepalive / retry plumbing). Business logic, auth code, and dependencies stay out of scope. | Direct edit on your source file, followed by your redeploy command. |
Retell is not currently supported by Optimise Prompt and will be re-enabled in a future revision. ElevenLabs, LiveKit, and Pipecat are routed through the self-hosted path when applicable.
What you need before clicking
The agent must be set up on Cekura
The optimiser reads the agent’s provider configuration to know how to fetch and apply edits. See Cekura Agent.
At least one failing evaluator
The loop’s input is failing runs. If every selected evaluator passes, the optimiser exits immediately.
For self-hosted agents — the live source file open in the IDE
The optimiser locates the system prompt by first checking the file open in your editor, then grepping the workspace for the prompt string constant. Files like
original_*.py, *.bak, anything under archive/ are skipped — the assistant will pause and confirm if there’s ambiguity.Iteration controls
The assistant accepts a few optional knobs when you mention them in the chat:| Setting | Default | Effect |
|---|---|---|
max_iterations | 10 | Caps the loop. Each Eval → Optimise hand-back counts as one iteration. |
auto_mode | true | When true, skips per-iteration diff approval and routine restart pauses (the one-time setup question for redeploy_command still fires). Set to false to gate every iteration’s diff on explicit approval. |
mode | auto-detected | Explicit override (vapi / self_hosted) when detection is ambiguous. |
Reading the assistant’s output
Each iteration emits the same structure in the chat:- Phase header — e.g.
Iteration 3 · Diagnose— names what’s happening so you can follow along. - Failure summary — the kept failures from this iteration’s runs, grouped by failure shape, with end-of-call attribution for each.
- Proposed combined diff — every edit the optimiser wants to apply, before / after, scoped to the smallest viable surface. In
auto_mode: falsethis is the approval gate. - Apply + sync confirmation — which fields landed, plus the redeploy step’s exit code for self-hosted agents.
- Overfitting gate verdict — which edits passed the gate, which were revised, which were stripped, and why.
- Eval decision — loop, declare success, or surface a stop condition.
Limits and current behavior
- One agent per session. The loop operates on one
agent_idat a time. Open separate chats for separate agents. - Validation set is the evaluators you selected. The optimiser does not silently widen the set mid-loop; if widening is needed, it asks first.
- Edits to dynamic-variable placeholders (
{{...}}) are skipped. Those are owned by your calling system, not the agent prompt. - Tool
messages,destinations, and squadmodel.toolIdsonly apply to VAPI. In self-hosted mode, findings against those surfaces are surfaced as hand-offs rather than direct edits. - The Cekura agent record’s
descriptionfield is treated as a mirror, not the source of truth, for self-hosted agents. The optimiser edits the prompt in your source file directly; updating the description without updating the file does not change live behavior.
How it maps to the AI Assistant chat
| Dashboard action | Equivalent in the AI Assistant chat |
|---|---|
| Click Optimise Prompt on the evaluators list with N evaluators selected | New chat opens with /self-improve via these evaluators - <N evaluator IDs> |
| Click Optimise Prompt on a run result page | New chat opens with /self-improve via these evaluators - <IDs> (result ids: <result_id>) |
Approving the iteration’s diff in auto_mode: false | Replying “approve” / “apply” to the combined-diff message |
| Stopping the loop early | Sending stop (or using the Stop control on the chat) |
Troubleshooting
The button is disabled. No evaluators are selected, or every selected evaluator is mid-run. Wait for runs to finish, or select at least one row. The chat opens but immediately exits with “all failures classify as Upstream”. Every kept failure is rooted in data or infrastructure, not the prompt. The optimiser surfaces the upstream hand-offs (e.g. KB content, dynamic-variable wiring) instead of producing phantom prompt edits. Iter 1 lands edits but the iter-2 transcripts look identical. The most likely cause is the deploy path: the file was edited but the live process is still running the old code. For self-hosted agents, verify theredeploy_command actually ran and exited 0. The optimiser’s no-change detector catches this after the fact and pauses, but verifying first saves an iteration.
The optimiser asks me to disambiguate which file is the live source. The IDE-opened file did not match the prompt-string constant uniquely (multiple matches in the workspace, or strong “not the live source” filename signals like *.bak). Confirm which file the running server actually reads.
The loop hits the iteration cap without converging. The same failure shape has persisted across multiple iterations of prompt-layer edits — the prompt layer is demonstrably not where the fix lives. The assistant surfaces architectural alternatives (stronger model, programmatic guard in code, flow restructure, evaluator hand-off) for you to choose.
Most failures cluster on one metric whose explanations look subjective. The metric is probably miscalibrated. Hand off to the Metric Lab improvement flow before iterating further on the agent.
Related
- Cekura Agent — set up an agent before optimising it.
- Metric Lab — improve evaluator quality when failures cluster on a noisy metric.
- Auto-optimise Metrics — the equivalent recurring workflow for metric prompts, run via Claude Code on a schedule.