Orchestration Frameworks
OpenAI Agents
OpenAI Agents is a framework for building agents that can perform tasks.
End-to-End Evals
deepeval allows you to evaluate OpenAI Agents under a minute.
Configure OpenAI Agents
import os
from agents import Runner, add_trace_processor
from deepeval.openai_agents import Agent, DeepEvalTracingProcessor
from deepeval.metrics import AnswerRelevancyMetric
from tests.test_integrations.utils import assert_trace_json, generate_trace_json
add_trace_processor(DeepEvalTracingProcessor())
weather_agent = Agent(
name="Weather Agent",
instructions="You are a weather agent. You are given a question about the weather and you need to answer it.",
agent_metrics=[AnswerRelevancyMetric()],
)Run evaluations
Create an EvaluationDataset and invoke your OpenAI Agent for each golden within the evals_iterator() loop to run end-to-end evaluations.
from deepeval.dataset import EvaluationDataset, Golden
dataset = EvaluationDataset(
goldens=[
Golden(input="What's the weather in UK?"),
Golden(input="What's the weather in France?"),
]
)
for golden in dataset.evals_iterator():
Runner.run_sync(weather_agent, golden.input)import asyncio
from deepeval.dataset import EvaluationDataset, Golden
dataset = EvaluationDataset(
goldens=[
Golden(input="What's the weather in UK?"),
Golden(input="What's the weather in France?"),
]
)
for golden in dataset.evals_iterator():
task = asyncio.create_task(Runner.run(weather_agent, golden.input))
dataset.evaluate(task)✅ Done. The evals_iterator will automatically generate a test run with individual evaluation traces for each golden.