LiteLLM

DeepEval allows you to use any model supported by LiteLLM to run evals, either through the CLI or directly in Python.

Command Line

To configure your LiteLLM model through the CLI, run the following command. You must specify the provider in the model name:

# OpenAI
deepeval set-litellm --model=openai/gpt-3.5-turbo

# Anthropic
deepeval set-litellm --model=anthropic/claude-3-opus

# Google
deepeval set-litellm --model=google/gemini-pro

You can also specify additional parameters:

# With API key
deepeval set-litellm --model=openai/gpt-3.5-turbo

# With custom API base
deepeval set-litellm --model=openai/gpt-3.5-turbo --base-url="https://your-custom-endpoint.com"

# With both API key and custom base
deepeval set-litellm \
    --model=openai/gpt-3.5-turbo \
    --base-url="https://your-custom-endpoint.com"

The CLI command above sets LiteLLM as the default provider for all metrics, unless overridden in Python code. To use a different default model provider, you must first unset LiteLLM:

deepeval unset-litellm

Python

When using LiteLLM in Python, you must always specify the provider in the model name. Here's how to use LiteLLMModel from DeepEval's model collection:

from deepeval.models import LiteLLMModel
from deepeval.metrics import AnswerRelevancyMetric

# OpenAI model
model = LiteLLMModel(
    model="openai/gpt-3.5-turbo",  # Provider must be specified
    api_key="your-api-key",  # optional, can be set via environment variable
    base_url="your-api-base",  # optional, for custom endpoints
    temperature=0
)

answer_relevancy = AnswerRelevancyMetric(model=model)

To use any LiteLLM model directly in deepeval, set the USE_LITELLM=1 in your env and simply pass the name of your desired model in your metric initialization:

from deepeval.metrics import AnswerRelevancyMetric

answer_relevancy = AnswerRelevancyMetric(
    model="openai/gpt-3.5-turbo",
)

You should also set the other necessary vars like LITELLM_API_KEY to be able to use the LiteLLM models as shown above.

There are ZERO mandatory and FIVE optional parameters when creating a LiteLLMModel:

[Optional] model (required): A string specifying the provider and model name (e.g., "openai/gpt-3.5-turbo", "anthropic/claude-3-opus"). Defaults to LITELLM_MODEL_NAME if not passed; raises an error at runtime if unset.
[Optional] api_key (optional): A string specifying the API key for the model. If not passed, DeepEval attempts (in order) LITELLM_API_KEY, LITELLM_PROXY_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY, then GOOGLE_API_KEY. If none are set, the key is left unset and the underlying LiteLLM/provider behavior applies.
[Optional] base_url (optional): A string specifying the base URL for the model API. Defaults to LITELLM_API_BASE, then LITELLM_PROXY_API_BASE if not passed.
[Optional] temperature (optional): A float specifying the model temperature. Defaults to TEMPERATURE if not passed; falls back to 0.0 if unset.
[Optional] generation_kwargs: A dictionary of additional generation parameters forwarded to LiteLLM’s completion(...) / acompletion(...) call

Environment Variables

You can also configure LiteLLM using environment variables:

# OpenAI
export OPENAI_API_KEY="your-api-key"

# Anthropic
export ANTHROPIC_API_KEY="your-api-key"

# Google
export GOOGLE_API_KEY="your-api-key"

# Custom endpoint
export LITELLM_API_BASE="https://your-custom-endpoint.com"

Available Models

OpenAI Models

openai/gpt-3.5-turbo
openai/gpt-4
openai/gpt-4-turbo-preview

Anthropic Models

anthropic/claude-3-opus
anthropic/claude-3-sonnet
anthropic/claude-3-haiku

Google Models

google/gemini-pro
google/gemini-ultra

Mistral Models

mistral/mistral-small
mistral/mistral-medium
mistral/mistral-large

LM Studio Models

lm-studio/Meta-Llama-3.1-8B-Instruct-GGUF
lm-studio/Mistral-7B-Instruct-v0.2-GGUF
lm-studio/Phi-2-GGUF

Ollama Models

ollama/llama2
ollama/mistral
ollama/codellama
ollama/neural-chat
ollama/starling-lm

Examples

Basic Usage with Different Providers

from deepeval.models import LiteLLMModel
from deepeval.metrics import AnswerRelevancyMetric

# OpenAI
model = LiteLLMModel(model="openai/gpt-3.5-turbo")
metric = AnswerRelevancyMetric(model=model)

# Anthropic
model = LiteLLMModel(model="anthropic/claude-3-opus")
metric = AnswerRelevancyMetric(model=model)

# Google
model = LiteLLMModel(model="google/gemini-pro")
metric = AnswerRelevancyMetric(model=model)

# LM Studio
model = LiteLLMModel(
    model="lm-studio/Meta-Llama-3.1-8B-Instruct-GGUF",
    base_url="http://localhost:1234/v1",  # LM Studio default URL
    api_key="lm-studio"  # LM Studio uses a fixed API key
)
metric = AnswerRelevancyMetric(model=model)

# Ollama
model = LiteLLMModel(
    model="ollama/llama2",
    base_url="http://localhost:11434/v1",  # Ollama default URL
    api_key="ollama"  # Ollama uses a fixed API key
)
metric = AnswerRelevancyMetric(model=model)

Using Custom Endpoint

model = LiteLLMModel(
    model="custom/your-model-name",  # Provider must be specified
    base_url="https://your-custom-endpoint.com",
    api_key="your-api-key"
)

Using with Schema Validation

from pydantic import BaseModel

class ResponseSchema(BaseModel):
    score: float
    reason: str

# OpenAI
model = LiteLLMModel(model="openai/gpt-3.5-turbo")
response, cost = model.generate(
    "Rate this answer: 'The capital of France is Paris'",
    schema=ResponseSchema
)

# LM Studio
model = LiteLLMModel(
    model="lm-studio/Meta-Llama-3.1-8B-Instruct-GGUF",
    base_url="http://localhost:1234/v1",
    api_key="lm-studio"
)
response, cost = model.generate(
    "Rate this answer: 'The capital of France is Paris'",
    schema=ResponseSchema
)

# Ollama
model = LiteLLMModel(
    model="ollama/llama2",
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)
response, cost = model.generate(
    "Rate this answer: 'The capital of France is Paris'",
    schema=ResponseSchema
)

Best Practices

Provider Specification: Always specify the provider in the model name (e.g., "openai/gpt-3.5-turbo", "anthropic/claude-3-opus", "lm-studio/Meta-Llama-3.1-8B-Instruct-GGUF", "ollama/llama2")
API Key Security: Store your API keys in environment variables rather than hardcoding them in your scripts.
Model Selection: Choose the appropriate model based on your needs:
- For simple tasks: Use smaller models like openai/gpt-3.5-turbo
- For complex reasoning: Use larger models like openai/gpt-4 or anthropic/claude-3-opus
- For cost-sensitive applications: Use models like mistral/mistral-small or anthropic/claude-3-haiku
- For local development:
  - Use LM Studio models like lm-studio/Meta-Llama-3.1-8B-Instruct-GGUF
  - Use Ollama models like ollama/llama2 or ollama/mistral
Error Handling: Implement proper error handling for API rate limits and connection issues.
Cost Management: Monitor your usage and costs, especially when using larger models.
Local Model Setup:
- LM Studio:
  - Make sure LM Studio is running and the model is loaded
  - Use the correct API base URL (default: http://localhost:1234/v1)
  - Use the fixed API key "lm-studio"
  - Ensure the model is properly downloaded and loaded in LM Studio
- Ollama:
  - Make sure Ollama is running and the model is pulled
  - Use the correct API base URL (default: http://localhost:11434/v1)
  - Use the fixed API key "ollama"
  - Pull the model first using ollama pull llama2 (or your chosen model)
  - Ensure you have enough system resources for the model

On this page