Orchestration Frameworks
Pydantic AI
Pydantic AI is a Python framework for building reliable, production-grade applications with Generative AI, providing type safety and validation for agent outputs and LLM interactions.
End-to-End Evals
deepeval allows you to evaluate Pydantic AI agents under a minute.
Configure Pydantic AI
Pass agent_metrics to the ConfidentInstrumentationSettings constructor.
from pydantic_ai import Agent
from deepeval.integrations.pydantic_ai.instrumentator import (
ConfidentInstrumentationSettings,
)
from deepeval.metrics import AnswerRelevancyMetric
agent = Agent(
"openai:gpt-5",
instructions="You are a helpful assistant.",
instrument=ConfidentInstrumentationSettings(
is_test_mode=True,
agent_metrics=[AnswerRelevancyMetric()]
),
)Run evaluations
Create an EvaluationDataset and invoke your Pydantic AI application for each golden within the evals_iterator() loop to run end-to-end evaluations.
import asyncio
dataset = EvaluationDataset(
goldens=[
Golden(input="What's the weather in Paris?"),
Golden(input="What's the weather in London?"),
]
)
for golden in dataset.evals_iterator():
task = asyncio.create_task(run_agent(golden.input))
dataset.evaluate(task)β
Done. The evals_iterator will automatically generate a test run with individual evaluation traces for each golden.
Evals in Production
To run online evaluations in production, replace metrics with a metric collection string from Confident AI, and push your Pydantic AI agent to production.
from pydantic_ai import Agent
from deepeval.integrations.pydantic_ai import ConfidentInstrumentationSettings
agent = Agent(
"openai:gpt-4o-mini",
system_prompt="Be concise, reply with one sentence.",
instrument=ConfidentInstrumentationSettings(
agent_metric_collection="test_collection_1",
)
)
result = agent.run_sync(
"What are the LLMs?"
)