Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cekura.ai/llms.txt

Use this file to discover all available pages before exploring further.

What it does

For every project and every LLM Judge metric, Cekura runs a daily audit that:
  1. Pulls all failing calls for that (project, metric) from the last 24 hours (falling back to 7 days if the day’s pool is small).
  2. Hands the transcripts, the metric card, and an audit prompt to an LLM agent.
  3. Has the agent read a sample of the calls, cluster them by root cause, and write a 1-6 theme breakdown.
  4. Saves the result on the metric’s Insights card in the dashboard.
These insights surface the dominant patterns behind a metric’s failures, so you can fix the agent instead of re-reading every flagged call.

Viewing and generating insights in the dashboard

Navigate to Observability → Insights in the sidebar. The page displays a card for each LLM Judge metric enabled on your project. Use the View dropdown at the top of the page to filter by view. When a view is selected, the page shows only metrics and insights for agents in that view. Each card shows one of the following:
  • The latest failure-mode audit with identified themes
  • A message indicating not enough failures were found in the analyzed window
  • A Generate button if no audit has run yet
Click Generate (or Regenerate for an existing audit) to run a new analysis on demand. The page polls automatically until the audit completes. Each failure theme includes a title, a brief description, and clickable call IDs that link directly to the corresponding call logs.

Which metrics are eligible

Only supported for LLM Judge type metrics as of now. Code based metrics such as Latency, WPM, Talk Ratio are excluded.
  • Custom metrics with type: llm_judge.
  • Predefined metrics — the following: CSAT, Critical Deviations Continuous, Critical Info Check, Critical Info Check bool, Expected Outcome, Gibberish Detection, Hallucination, Letterwise Pronunciation Detection, Main Agent Early End Call, Not Early Termination, Pronunciation Analysis, Pronunciation test, Relevancy, Response Consistency, STT Errors, Tool call Accuracy, Unnecessary Repetition Count, Unnecessary Repetition Score.

How the daily run works

A Celery Beat cron fires at 06:00 UTC daily. For each (project, eligible-metric) pair, our agent analyses the failing calls and writes the resulting themes to the metric’s Insights card. If a metric has fewer than the minimum failing-call threshold even over 7 days, the card shows “Not enough metric failure instances in last 7 days.”

On-demand generation via API

For programmatic access, you can trigger an audit without waiting for the next daily run by calling the generate endpoint:
curl -X POST https://api.cekura.ai/api/projects/<project_id>/metric_failure_mode_insights/ \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "metric_name": "Hallucination",
    "window_days": 7,
    "max_calls": 100
  }'
Poll the returned row’s status field until it’s succeeded or failed:
curl https://api.cekura.ai/api/projects/<project_id>/metric_failure_mode_insights/<id>/ \
  -H "Authorization: Bearer $API_KEY"
When status is succeeded, the failure_modes array contains the themes.