Skip to main content

Python Code Metrics

Python Code Metrics allow you to write custom evaluation logic in Python to evaluate your AI agent’s performance. This gives you complete control over the evaluation process and enables complex analysis that goes beyond simple prompt-based metrics.

Overview

Custom code metrics are executed in a secure Python environment with access to call data including transcripts, metadata, and dynamic variables. Your code must set specific output variables to provide the evaluation result and explanation.

Available Data Variables

When writing your custom code, you have access to the following data variables:

Fields

Full conversation transcript as a formatted string with timestamps
# Access the full transcript
transcript = data["transcript"]
# Output:
# "[00:01] Main Agent: Hello.\n[00:12] Testing Agent: L m z o uh-huh.\n[00:14] Main Agent: Could you clarify your message or let me know how I can assist you?\n[00:22] Testing Agent: Hello? I'm Vicky.\n[00:27] Main Agent: Hi, Vicky. How can I help you today?..."
Transcript as a structured list with detailed timing and speaker information
# Access structured transcript with detailed timing
transcript_json = data["transcript_json"]
# Actual structure:
# [
#   {
#     "role": "Main Agent",
#     "time": "00:01", 
#     "content": "Hello.",
#     "end_time": 1.817,
#     "start_time": 1.317
#   },
#   {
#     "role": "Testing Agent",
#     "time": "00:12",
#     "content": "L m z o uh-huh.", 
#     "end_time": 13.817,
#     "start_time": 12.357
#   }
# ]

Call duration in seconds as a float
# Access call duration
call_duration = data["call_duration"]
# Output: 125.5 (seconds)

Reason why the call ended
# Access call end reason
end_reason = data["call_end_reason"]
# Example values: "main-agent-ended-call", "testing-agent-ended-call"

URL to the voice recording file
# Access voice recording URL
recording_url = data["voice_recording"]
# Output: "https://recordings.example.com/call_123.wav"

Description of the AI agent used in the call
# Access agent description
agent_desc = data["agent_description"]
# Output: "Customer service agent with product knowledge and billing expertise"

Additional context metadata as a dictionary
# Access metadata
metadata = data["metadata"]
# Output: {"customer_tier": "premium", "region": "US", "language": "en"}

Dynamic variables configured for the agent as a dictionary
# Access dynamic variables
variables = data["dynamic_variables"]
# Output: {"customer_name": "John", "account_id": "ACC123", "plan_type": "premium"}

Tags associated with the call/scenario as a list
# Access tags
tags = data["tags"]
# Output: ["billing", "priority", "escalation"]

Call topic/subject
# Access call topic
topic = data["topic"]
# Output: "Billing inquiry and payment issues"

Metric Results Access

Access any evaluated metric result directly by name
# Access individual metric results by name
customer_satisfaction = data["Customer Satisfaction"]  # Could be: 4.5, "Excellent", 85
response_time = data["Response Time"]  # Could be: 120 (seconds)
product_knowledge = data["Product Knowledge"]  # Could be: 85, "Good", 4.2
workflow_adherence = data["Workflow Adherence"]  # Could be: "Good", 0.8, 78

List of explanation strings for each metric
# Access metric explanations
explanations = data["explanation"]

# Get explanations for specific metrics
satisfaction_reasons = explanations["Customer Satisfaction"]
# Example: ["Customer expressed satisfaction", "Positive tone detected", "Issue resolved"]

response_reasons = explanations["Response Time"]
# Example: ["Response was within acceptable range", "No long pauses detected"]

List of explanation strings for expected outcome
# Access expected outcome explanations
expected_explanations = data["expected_outcome_explanation"]
# Example: ["Expected positive customer outcome", "Billing issue should be resolved"]

expected_outcome = data["expected_outcome"]  # Example: 4.2

Latency metrics for performance analysis
# Access latency metrics
avg_latency = data["Average Latency (in ms)"]  # Example: 1607.5
latency_data = data["latency_data"]
# Actual structure:
# [
#   {"latency": 1680.0, "speaker": "Main Agent", "start_time": 14.07},
#   {"latency": 1240.0, "speaker": "Main Agent", "start_time": 26.51},
#   {"latency": 1970.0, "speaker": "Main Agent", "start_time": 38.33},
#   {"latency": 1540.0, "speaker": "Main Agent", "start_time": 51.68}
# ]

Context-Specific Fields

Available when evaluating call logs - fields specific to real customer calls
# Call log specific fields
call_log_id = data["call_log_id"]  # Example: 12345
topic = data["topic"]  # Example: "Billing inquiry and payment issues"
call_duration = data["call_duration"]  # Example: 245.5 (seconds)

Available when evaluating runs/simulations - fields specific to test scenarios
# Run/simulation specific fields
run_id = data["run_id"]  # Example: 456
test_profile = data["test_profile"]  # Example: {"company": "Cekura", "customer_type": "frustrated", "issue": "billing"}
call_duration = data["call_duration"]  # Example: 180.5 (seconds)

# Common usage: Evaluate test scenario performance against profile
company = test_profile.get("company", "Unknown")
customer_type = test_profile.get("customer_type", "")
issue = test_profile.get("issue", "")

Required Output Variables

Your Python code must set these two variables:
  • _result - The evaluation outcome (can be boolean, numeric, string, etc.)
  • _explanation - A string explaining the reasoning behind the result

Example Code

Here’s a simple example that checks if the agent mentioned a specific product:
# Check if the agent mentioned "Premium Plan" in the conversation
transcript = data["transcript"].lower()
if "premium plan" in transcript:
    _result = True
    _explanation = "Agent successfully mentioned the Premium Plan during the conversation"
else:
    _result = False
    _explanation = "Agent did not mention the Premium Plan in the conversation"

Complete Data Reference

Here’s the complete structure of data available to your custom Python code:
# Available Data 
{
  "transcript": "[00:01] Main Agent: Hello.\n[00:12] Testing Agent: L m z o uh-huh.\n[00:14] Main Agent: Could you clarify your message or let me know how I can assist you?...",

  "transcript_json": [
    {
      "role": "Main Agent",
      "time": "00:01",
      "content": "Hello.",
      "end_time": 1.817,
      "start_time": 1.317
    },
    {
      "role": "Testing Agent",
      "time": "00:12",
      "content": "L m z o uh-huh.",
      "end_time": 13.817,
      "start_time": 12.357
    }
  ],

  // ---------- Context Fields ----------
  "call_duration": 180.5,
  "call_end_reason": "customer_satisfaction",
  "voice_recording": "https://recordings.example.com/call123.wav",
  "agent_description": "Customer service agent with product knowledge",
  "metadata": {
    "key": "value"
  },
  "dynamic_variables": {
    "customer_name": "John"
  },
  "tags": ["priority", "billing_inquiry"],

  // ---------- Metric Results ----------
  "Customer Satisfaction": 4.5,
  "Response Time": 120,
  "Product Knowledge": 85,

  "explanation": {
    "Customer Satisfaction": [
      "Customer expressed satisfaction",
      "Positive tone"
    ],
    "Response Time": [
      "Response was within acceptable range"
    ]
  },

  // ---------- Latency Data ----------
  "Average Latency (in ms)": 1607.5,

  "latency_data": [
    {
      "latency": 1680.0,
      "speaker": "Main Agent",
      "start_time": 14.07
    },
    {
      "latency": 1240.0,
      "speaker": "Main Agent",
      "start_time": 26.51
    },
    {
      "latency": 1970.0,
      "speaker": "Main Agent",
      "start_time": 38.33
    },
    {
      "latency": 1540.0,
      "speaker": "Main Agent",
      "start_time": 51.68
    }
  ],

  // ---------- Expected Outcome ----------
  "expected_outcome": 4.2,
  "expected_outcome_explanation": [
    "Expected positive outcome"
  ],

  // ---------- Call Log Context ----------
  "call_log_id": 123,
  "topic": "Billing inquiry",

  // ---------- Run / Simulation Context ----------
  "run_id": 456,
  "test_profile": {
    "company": "Cekura"
  }
}

Data Flow and Execution Order

Important: Custom Python code metrics execute after all other metrics (Basic, Advanced, and pre-defined metrics). This means:
  1. Non-custom metrics evaluate first
  2. Results are structured and merged into the data dictionary
  3. Custom code receives ALL previous results via direct dictionary access
  4. Custom code can build upon or combine existing metric results

Using Metric Results

You can access the results of other metrics that were evaluated for the same call directly by metric name using data["Metric Name"]. You can also access their explanations using data["explanation"]["Metric Name"]. Example usage:
# Access metric results directly by name
customer_satisfaction = data["Customer Satisfaction"]
response_time = data["Response Time"]
product_knowledge = data["Product Knowledge"]

# Access metric explanations
satisfaction_reasons = data["explanation"]["Customer Satisfaction"]
response_reasons = data["explanation"]["Response Time"]

# Each metric result contains the evaluation outcome
if isinstance(customer_satisfaction, (int, float)) and customer_satisfaction > 4.0 and response_time < 60:
    _result = "Excellent"
    _explanation = f"Customer was satisfied ({satisfaction_reasons[0]}) and response time was fast ({response_time}s)"

Example for Calling New Binary And Advanced Metrics

To call new Binary and Advanced metrics from your Python code, you can use the evaluate_advance_metric function. This function requires your API key, a description or prompt for the metric, and the metric name.

Example For Calling Basic Metrics

key = "<your_cekura_api_key"

def get_not_early_end_call_description():
    return f"""You are an AI quality assurance analyst tasked with evaluating customer service call transcripts. Your primary objective is to determine if a Main Agent terminated a call prematurely without valid reason. This analysis is crucial for maintaining high standards in customer service interactions."""

call_end_reason = data["call_end_reason"]
transcript_json = data["transcript_json"]

if "main" not in call_end_reason.lower():
    _score = 5
    _explanation = "The call was ended by the Testing Agent or due to error."

description = get_not_early_end_call_description()

response = evaluate_advance_metric(data, key, description, "binary_workflow_adherence")
_result = response.get("result")
_explanation = response.get("explanation")

Example For Calling Advanced Metrics


key = "<your_cekura_api_key"

def get_not_early_end_call_prompt(transcript, call_end_reason):
    return f"""You are an AI quality assurance analyst tasked with evaluating customer service call transcripts. Your primary objective is to determine if a Main Agent terminated a call prematurely without valid reason. This analysis is crucial for maintaining high standards in customer service interactions.

Please review the following call transcript:

<call_transcript>
{transcript}
</call_transcript>

Now, consider the reason provided for why the call ended:

<call_end_reason>
{call_end_reason}
</call_end_reason>

Your task is to analyze the transcript and call end reason to determine if the Main Agent terminated the call early without justification. 
"""

if "transcript_json" not in data or not data["transcript_json"]:
    _score = None
    _explanation = "No transcript available"

if "call_end_reason" not in data or not data["call_end_reason"]:
    _score = None
    _explanation = "No call end reason available"


call_end_reason = data["call_end_reason"]
transcript_json = data["transcript_json"]

if "main" not in call_end_reason.lower():
    _score = 5
    _explanation = "The call was ended by the Testing Agent or due to error."

prompt = get_not_early_end_call_prompt(data["transcript"], data["call_end_reason"])

response = evaluate_advance_metric(data, key, prompt, "binary_workflow_adherence")
_result = response.get("result")
_explanation = response.get("explanation")

Advanced Example

Here’s a more complex example that analyzes sentiment and response time:
import re
from datetime import datetime

# Get transcript data
transcript = data["transcript"]
call_duration = data["call_duration"]

# Analyze agent responses
agent_responses = []
lines = transcript.split('\n')

for line in lines:
    if line.strip().startswith('Agent:'):
        response = line.replace('Agent:', '').strip()
        agent_responses.append(response)

# Calculate average response length
if agent_responses:
    avg_response_length = sum(len(response) for response in agent_responses) / len(agent_responses)

    # Check if responses are detailed enough (more than 50 characters average)
    if avg_response_length > 50:
        _result = True
        _explanation = f"Agent provided detailed responses with average length of {avg_response_length:.1f} characters"
    else:
        _result = False
        _explanation = f"Agent responses were too brief with average length of {avg_response_length:.1f} characters"
else:
    _result = False
    _explanation = "No agent responses found in transcript"

Example Using Multiple Data Sources

Here’s an example that combines multiple metric results with call metadata and tags:
# Access metric results directly by name
try:
    satisfaction = data["Customer Satisfaction"]
    response_time = data["Response Time"]

    # Access additional call data
    call_duration = data["call_duration"]
    call_end_reason = data["call_end_reason"]
    tags = data["tags"]

    # Check if this was a priority call based on tags
    is_priority = "priority" in tags or "vip" in tags

    # Evaluate based on multiple factors
    if call_end_reason == "hangup" and isinstance(satisfaction, (int, float)) and satisfaction > 3.0 and response_time < 60:
        if is_priority:
            _result = "Excellent"
            _explanation = f"Priority customer was satisfied ({satisfaction}) with fast response time ({response_time}s) and completed the call normally"
        else:
            _result = "Good"
            _explanation = f"Customer was satisfied ({satisfaction}) with fast response time ({response_time}s) and completed the call normally"
    elif call_end_reason in ["timeout", "error"]:
        _result = "Poor"
        _explanation = f"Call ended unexpectedly due to {call_end_reason}, indicating technical issues"
    else:
        _result = "Needs Improvement"
        _explanation = f"Call performance needs improvement - satisfaction: {satisfaction}, response time: {response_time}s, ended reason: {call_end_reason}"

except KeyError as e:
    _result = "Incomplete"
    _explanation = f"Required data not found: {str(e)}"