Python Code Metrics
Python Code Metrics allow you to write custom evaluation logic in Python to evaluate your AI agent’s performance. This gives you complete control over the evaluation process and enables complex analysis that goes beyond simple prompt-based metrics.Overview
Custom code metrics are executed in a secure Python environment with access to call data including transcripts, metadata, and dynamic variables. Your code must set specific output variables to provide the evaluation result and explanation.Available Data Variables
When writing your custom code, you have access to the following data variables:Fields
transcript
transcript
Full conversation transcript as a formatted string with timestamps
transcript_json
transcript_json
Transcript as a structured list with detailed timing and speaker information
call_duration
call_duration
Call duration in seconds as a float
call_end_reason
call_end_reason
Reason why the call ended
voice_recording
voice_recording
URL to the voice recording file
agent_description
agent_description
Description of the AI agent used in the call
metadata
metadata
Additional context metadata as a dictionary
dynamic_variables
dynamic_variables
Dynamic variables configured for the agent as a dictionary
tags
tags
topic
topic
Call topic/subject
Metric Results Access
Individual Metric Results
Individual Metric Results
Access any evaluated metric result directly by name
Metric Explanations
Metric Explanations
List of explanation strings for each metric
Expected Outcome Explanation
Expected Outcome Explanation
List of explanation strings for expected outcome
Latency Metrics
Latency Metrics
Latency metrics for performance analysis
Context-Specific Fields
Call Log Context
Call Log Context
Available when evaluating call logs - fields specific to real customer calls
Run/Simulation Context
Run/Simulation Context
Available when evaluating runs/simulations - fields specific to test scenarios
Required Output Variables
Your Python code must set these two variables:_result- The evaluation outcome (can be boolean, numeric, string, etc.)_explanation- A string explaining the reasoning behind the result
Example Code
Here’s a simple example that checks if the agent mentioned a specific product:Complete Data Reference
Here’s the complete structure of data available to your custom Python code:Data Flow and Execution Order
Important: Custom Python code metrics execute after all other metrics (Basic, Advanced, and pre-defined metrics). This means:- Non-custom metrics evaluate first
- Results are structured and merged into the
datadictionary - Custom code receives ALL previous results via direct dictionary access
- Custom code can build upon or combine existing metric results
Using Metric Results
You can access the results of other metrics that were evaluated for the same call directly by metric name usingdata["Metric Name"]. You can also access their explanations using data["explanation"]["Metric Name"].
Example usage:
Example for Calling New Binary And Advanced Metrics
To call new Binary and Advanced metrics from your Python code, you can use theevaluate_advance_metric function. This function requires your API key, a description or prompt for the metric, and the metric name.