Create a quality metric for evaluating agent conversations.
Documentation Index
Fetch the complete documentation index at: https://docs.cekura.ai/llms.txt
Use this file to discover all available pages before exploring further.
API Key Authentication. It should be included in the header of each request.
Name of the metric
Description of what this metric evaluates
Whether this metric evaluates audio content
The evaluation prompt used for this metric
ID of the project this metric belongs to.
External identifier for the assistant
Type of metric (llm_judge recommended; basic and custom_prompt are deprecated)
basic - Basic (Deprecated in favor of LLM Judge)custom_prompt - Custom Prompt ( Deprecated in favor of LLM Judge)custom_code - Custom Codellm_judge - LLM Judgebasic, custom_prompt, custom_code, llm_judge Output shape of the evaluation score
binary_workflow_adherence - Binary Workflow Adherencebinary_qualitative - Binary Qualitativecontinuous_qualitative - Continuous Qualitativenumeric - Numericenum - Enumbinary_workflow_adherence, binary_qualitative, continuous_qualitative, numeric, enum Order in which to display this metric in the UI
Custom configuration parameters for specific metrics.
For pronounciation metric, you can set words as 2-tuple (word, phonemes) list
example:
{
"words": [["hello", "hɛl.loʊ"], ["world", "wɝɚɚɚld"]]
}
List of agent IDs to enable this project-level metric for. Only applicable when project is set.
Possible values for enum-type metrics (list of strings, e.g. ["resolved", "escalated", "abandoned"])
When enabled, this metric is automatically assigned to new agents created in the project.
Enable this metric for simulations.
Example: true or false
Enable this metric for observability.
Example: true or false
Enable sampling for this metric using project-level sample rate
When to run this metric.
always — evaluate every call (default)
automatic — system decides based on call content
custom — only evaluate when evaluation_trigger_prompt condition is met
always - Always
automatic - Automatic
custom - Custom
always, automatic, custom LLM prompt that decides whether to evaluate this call. Only used when evaluation_trigger=custom and trigger_type=llm_judge.
Example: "Did the agent offer a refund?"
How to evaluate the trigger condition. Only relevant when evaluation_trigger=custom.
llm_judge — use evaluation_trigger_prompt (default)
custom_code — use evaluation_trigger_custom_code
llm_judge - LLM Judge
custom_code - Custom Code
llm_judge, custom_code Python code to evaluate the trigger condition. Only used when evaluation_trigger=custom and trigger_type=custom_code.
Python code that implements the metric evaluation. Required when type=custom_code. Must define a function evaluate(transcript, ...) -> bool | float | str.
Name of the metric
Description of what this metric evaluates
Whether this metric evaluates audio content
The evaluation prompt used for this metric
ID of the project this metric belongs to.
External identifier for the assistant
Type of metric (llm_judge recommended; basic and custom_prompt are deprecated)
basic - Basic (Deprecated in favor of LLM Judge)custom_prompt - Custom Prompt ( Deprecated in favor of LLM Judge)custom_code - Custom Codellm_judge - LLM Judgebasic, custom_prompt, custom_code, llm_judge Output shape of the evaluation score
binary_workflow_adherence - Binary Workflow Adherencebinary_qualitative - Binary Qualitativecontinuous_qualitative - Continuous Qualitativenumeric - Numericenum - Enumbinary_workflow_adherence, binary_qualitative, continuous_qualitative, numeric, enum Order in which to display this metric in the UI
Custom configuration parameters for specific metrics.
For pronounciation metric, you can set words as 2-tuple (word, phonemes) list
example:
{
"words": [["hello", "hɛl.loʊ"], ["world", "wɝɚɚɚld"]]
}
List of agent IDs to enable this project-level metric for. Only applicable when project is set.
Possible values for enum-type metrics (list of strings, e.g. ["resolved", "escalated", "abandoned"])
When enabled, this metric is automatically assigned to new agents created in the project.
Enable this metric for simulations.
Example: true or false
Enable this metric for observability.
Example: true or false
Enable sampling for this metric using project-level sample rate
When to run this metric.
always — evaluate every call (default)
automatic — system decides based on call content
custom — only evaluate when evaluation_trigger_prompt condition is met
always - Always
automatic - Automatic
custom - Custom
always, automatic, custom LLM prompt that decides whether to evaluate this call. Only used when evaluation_trigger=custom and trigger_type=llm_judge.
Example: "Did the agent offer a refund?"
How to evaluate the trigger condition. Only relevant when evaluation_trigger=custom.
llm_judge — use evaluation_trigger_prompt (default)
custom_code — use evaluation_trigger_custom_code
llm_judge - LLM Judge
custom_code - Custom Code
llm_judge, custom_code Python code to evaluate the trigger condition. Only used when evaluation_trigger=custom and trigger_type=custom_code.
Python code that implements the metric evaluation. Required when type=custom_code. Must define a function evaluate(transcript, ...) -> bool | float | str.