Pydantic AI

Pydantic AI is a Python framework for building reliable, production-grade applications with Generative AI, providing type safety and validation for agent outputs and LLM interactions.

We recommend logging in to Confident AI to view your Pydantic AI evaluations.

deepeval login

For users in the EU region, please set your OTEL endpoint in the env as following:

export CONFIDENT_OTEL_URL="https://eu.otel.confident-ai.com"

Or if you're in the AU region, please set your OTEL endpoint in the env as following:

export CONFIDENT_OTEL_URL="https://au.otel.confident-ai.com"

End-to-End Evals

deepeval allows you to evaluate Pydantic AI agents under a minute.

Configure Pydantic AI

Pass agent_metrics to the ConfidentInstrumentationSettings constructor.

main.py

from pydantic_ai import Agent
from deepeval.integrations.pydantic_ai.instrumentator import (
    ConfidentInstrumentationSettings,
)
from deepeval.metrics import AnswerRelevancyMetric

agent = Agent(
    "openai:gpt-5",
    instructions="You are a helpful assistant.",
    instrument=ConfidentInstrumentationSettings(
        is_test_mode=True,
        agent_metrics=[AnswerRelevancyMetric()]
    ),
)

Run evaluations

Create an EvaluationDataset and invoke your Pydantic AI application for each golden within the evals_iterator() loop to run end-to-end evaluations.

main.py

import asyncio

dataset = EvaluationDataset(
    goldens=[
        Golden(input="What's the weather in Paris?"),
        Golden(input="What's the weather in London?"),
    ]
)

for golden in dataset.evals_iterator():
    task = asyncio.create_task(run_agent(golden.input))
    dataset.evaluate(task)

✅ Done. The evals_iterator will automatically generate a test run with individual evaluation traces for each golden.

View on Confident AI (optional)

Evals in Production

To run online evaluations in production, replace metrics with a metric collection string from Confident AI, and push your Pydantic AI agent to production.

from pydantic_ai import Agent
from deepeval.integrations.pydantic_ai import ConfidentInstrumentationSettings

agent = Agent(
    "openai:gpt-4o-mini",
    system_prompt="Be concise, reply with one sentence.",
    instrument=ConfidentInstrumentationSettings(
        agent_metric_collection="test_collection_1",
    )
)

result = agent.run_sync(
    "What are the LLMs?"
)