Metrics

Setup steps and authentication are in the Overview. This page covers metric authoring and management.

A metric scores a run or production call. Cekura supports three flavors:

Predefined — platform-managed metrics like sentiment, interruption count, latency.
LLM judge — you write a prompt; an LLM scores the transcript.
Custom code — Python that runs against the call payload.

Browse predefined metrics

These are read-only and shared across the platform.

cekura predefined-metrics list

catalog = client.predefined_metrics.list()
for m in catalog:
    print(m["slug"], m["name"], m["eval_type"])

Create a metric

Prepare metric.json:

{
  "agents": [123],
  "name": "Booking confirmed",
  "type": "llm_judge",
  "eval_type": "boolean",
  "prompt": "Did the agent confirm a booking and read back the date and time?"
}

Apply:

cekura metrics create --from-file metric.json

from cekura import Cekura

client = Cekura()

# LLM-judge metric
metric = client.metrics.create(
    agents=[123],
    name="Booking confirmed",
    type="llm_judge",
    eval_type="boolean",
    prompt="Did the agent confirm a booking and read back the date and time?",
)

# Python (custom code) metric
client.metrics.create(
    agents=[123],
    name="Hold duration > 30s",
    type="python",
    eval_type="boolean",
    code="""
def evaluate(call):
return any(seg['duration'] > 30 for seg in call.get('hold_segments', []))
""",
)

List, update, delete

cekura metrics list --agent-id 123
cekura metrics get 55
cekura metrics update 55 --from-file patch.json
cekura metrics delete 55

client.metrics.list(agent_id=123)
client.metrics.get(metric_id=55)
client.metrics.update(metric_id=55, prompt="Refined judge prompt")
client.metrics.delete(metric_id=55)

Bulk operations

Manage many metrics in one call.

# Create several at once
cekura metrics bulk-create --from-file metrics.json

# Bulk-attach metrics across agents
cekura metrics bulk-manage-agents \
  --metric-ids 55,56 \
  --agents-to-add 123,124

# Toggle active state on many metrics
cekura metrics bulk-toggle-settings --metric-ids 55,56 --is-active true

# Create several at once
client.metrics.bulk_create(metrics=[
    {"name": "Greeting present", "type": "llm_judge", "prompt": "...", "agents": [123]},
    {"name": "Closing present",  "type": "llm_judge", "prompt": "...", "agents": [123]},
])

# Attach / detach metrics across agents
client.metrics.bulk_manage_agents(
    metric_ids=[55, 56],
    agents_to_add=[123, 124],
    agents_to_remove=[],
)

# Toggle settings on many metrics at once
client.metrics.bulk_toggle_settings(
    metric_ids=[55, 56],
    is_active=True,
)

Generate metrics with AI

Stuck on what to measure? Cekura can propose metrics from your agent’s scenarios.

cekura metrics generate --agent-id 123 --count 10
cekura metrics generate-progress --progress-id <id>

job = client.metrics.generate(agent_id=123, count=10)
client.metrics.generate_progress(progress_id=job["progress_id"])

Critical metric scenarios & reviews

Two advanced workflows:

Critical metric scenarios — scenarios flagged because their metric output drifted. Useful for triage.
Metric reviews (Labs pipeline) — kick off feedback processing on metric judgments and watch progress.

# Critical metric scenarios
cekura critical-metric-scenarios list --metric-id 55
cekura critical-metric-scenarios update 42 --json '{"is_resolved": true}'

# Metric reviews
cekura metric-reviews process-feedbacks --metric-id 55
cekura metric-reviews progress --job-id <id>

# Critical metric scenarios
client.critical_metric_scenarios.list(metric_id=55)
client.critical_metric_scenarios.update(id=42, is_resolved=True)

# Metric reviews
job = client.metric_reviews.process_feedbacks(metric_id=55)
client.metric_reviews.process_feedbacks_progress(job_id=job["job_id"])

Runs & Results

Where metric scores show up after a run.

Calls

Run metrics against production calls, not just simulations.

Metric concepts

LLM judge, Python, rubric, sampling — when to use what.

API Reference

Full field reference for metric payloads.

CLI & SDK

Browse predefined metrics

Create a metric

List, update, delete

Bulk operations

Generate metrics with AI

Critical metric scenarios & reviews

See also

Runs & Results

Calls

Metric concepts

API Reference

CLI & SDK

Documentation Index

​Browse predefined metrics

​Create a metric

​List, update, delete

​Bulk operations

​Generate metrics with AI

​Critical metric scenarios & reviews

​See also

Runs & Results

Calls

Metric concepts

API Reference

Browse predefined metrics

Create a metric

List, update, delete

Bulk operations

Generate metrics with AI

Critical metric scenarios & reviews

See also