Setup steps and authentication are in the Overview. This page covers production-call observability.
List calls
- CLI
- SDK
project_id, agent_id, from_date, to_date, success, topic. See the API Reference for the full list.
Inspect a call
- CLI
- SDK
Send a call for evaluation
If your provider isn’t on the auto-ingestion list — or you want to ship a call from your own backend — send it explicitly.- CLI
- SDK
call.json contains the agent ID, transcript, duration, and metadata.Run metrics on a call
Score an already-ingested call against one or more metrics.- CLI
- SDK
Generate scenarios from real calls
Found an interesting production call? Turn it into regression scenarios.- CLI
- SDK
Promote a call into a test set
- CLI
- SDK
Vote on a metric result
Capture thumbs up/down feedback on a specific metric evaluation for a call, optionally attach the expected value and free-text feedback. The call is marked as reviewed; the metric evaluation is updated. Feeds the labs / metric-review workflow.- CLI
- SDK
--expected-value is parsed as JSON when possible (5, true, "foo"), so numeric / boolean metrics get the right type.Flag a critical-scenario verdict as wrong
If a critical-scenario evaluation looks incorrect, flag it (and later unflag if it was a misclick or has since been corrected).- SDK
Improve the agent’s prompt from real failures
Use the failure pattern across recent calls to suggest a prompt improvement.- CLI
- SDK
See also
Metrics
Define what you score production calls on.
Runs & Results
Same scoring engine — for simulated, not production, traffic.
Dashboards
Visualize call quality and metric trends over time.
API Reference
Full field reference for call logs and observation payloads.