Failure-Mode Insights

What it does

For every project and eligible metric, Cekura categorizes failing calls continuously as they come in:

When a call fails a metric, its root cause is classified into that metric’s failure-mode set — reusing an existing mode when a single fix would resolve it, or opening a new mode when nothing fits. A call with more than one distinct root cause lands in multiple modes.
Each failure mode carries a running count and its example calls; a periodic pass merges near-duplicate modes that share a single fix.
The metric’s Insights card shows its failure modes sorted by frequency, defaulting to the last 24 hours — adjust the date range to widen or narrow the window.

These insights surface the dominant patterns behind a metric’s failures, so you can fix the agent instead of re-reading every flagged call.

Viewing and generating insights in the dashboard

Navigate to Observability → Insights in the sidebar. The page displays a card for each LLM Judge metric enabled on your project. Use the View dropdown at the top of the page to filter by view. When a view is selected, the page shows only metrics and insights for agents in that view. Each card shows one of the following:

The latest failure-mode audit with identified themes
A message indicating not enough failures were found in the analyzed window
A Generate button if no audit has run yet

Click Generate (or Regenerate for an existing audit) to run a new analysis on demand. The page polls automatically until the audit completes. Each failure theme includes a title, a brief description, and clickable call IDs that link directly to the corresponding call logs.

Which metrics are eligible

Only supported for LLM Judge type metrics as of now. Code based metrics such as Latency, WPM, Talk Ratio are excluded.

Custom metrics with type: llm_judge.
Predefined metrics — the following: CSAT, Critical Deviations Continuous, Critical Info Check, Critical Info Check bool, Expected Outcome, Gibberish Detection, Hallucination, Letterwise Pronunciation Detection, Main Agent Early End Call, Not Early Termination, Pronunciation Analysis, Pronunciation test, Relevancy, Response Consistency, STT Errors, Tool call Accuracy, Unnecessary Repetition Count, Unnecessary Repetition Score.

​What it does

​Viewing and generating insights in the dashboard

​Which metrics are eligible

What it does

Viewing and generating insights in the dashboard

Which metrics are eligible