LLM Provider Expansion Guide
Paladin Multi-Provider Support
This document provides a comprehensive comparison of LLM providers supported by Paladin and guidance for configuring and using them effectively.
Table of Contents
- Overview
- Provider Comparison
- Configuration Guide
- Use Case Recommendations
- Migration Guide
- Performance Characteristics
Overview
Paladin supports multiple LLM providers out of the box, allowing you to choose the best provider for your specific needs. All providers implement the same LlmPort trait, making it easy to switch between them without changing your application logic.
Supported Providers
- OpenAI (GPT-4, GPT-3.5-turbo, GPT-4 Vision)
- DeepSeek (DeepSeek-Chat, DeepSeek-Coder)
- Anthropic (Claude 3.5 Sonnet, Claude 3 Opus/Haiku)
Provider Comparison
| Feature | OpenAI | DeepSeek | Anthropic |
|---|---|---|---|
| Streaming | ✅ Yes | ✅ Yes | ✅ Yes |
| Tool Calling | ✅ Yes | ✅ Yes | ✅ Yes |
| Function Calling | ✅ Yes | ✅ Yes | ✅ Yes |
| Vision/Images | ✅ GPT-4V | ❌ No | ✅ Claude 3+ |
| Max Context | 128K (GPT-4) | 64K | 200K (Claude 3) |
| Best For | General purpose, production | Cost-effective, reasoning | Safety-critical, analysis |
| Pricing | $$ | $ | $$$ |
| Latency | Low | Low | Low-Medium |
Detailed Feature Matrix
OpenAI
-
Strengths:
- Most mature ecosystem with extensive tooling
- Wide range of models (GPT-4, GPT-3.5-turbo, GPT-4 Vision)
- Excellent for general-purpose applications
- Strong vision/multimodal capabilities
- Large community and documentation
-
Limitations:
- Higher cost compared to alternatives
- Context window smaller than Claude
- Rate limiting on free tier
-
Ideal Use Cases:
- Production deployments requiring reliability
- Applications needing vision/image analysis
- General-purpose AI assistants
- Well-documented, standard use cases
DeepSeek
-
Strengths:
- Most cost-effective option
- Strong reasoning and code generation
- High throughput capabilities
- Good for analytical tasks
- Competitive performance at lower cost
-
Limitations:
- Smaller context window (64K)
- No vision support
- Newer ecosystem, less community resources
-
Ideal Use Cases:
- Cost-sensitive deployments
- Code generation and analysis
- Logical reasoning tasks
- High-volume/batch processing
- Internal tooling and development
Anthropic Claude
-
Strengths:
- Largest context window (200K tokens)
- Strong safety and ethical guidelines
- Excellent for complex analysis
- Superior long-document processing
- Strong instruction following
-
Limitations:
- Higher cost
- Claude-specific API differences (system messages separate)
- Requires max_tokens parameter
-
Ideal Use Cases:
- Safety-critical applications
- Complex document analysis
- Long-context reasoning
- Compliance and governance
- Medical/legal/financial applications
Configuration Guide
Environment Variables
All providers can be configured via environment variables:
# OpenAI
export OPENAI_API_KEY="sk-..."
# DeepSeek
export DEEPSEEK_API_KEY="..."
export DEEPSEEK_BASE_URL="https://api.deepseek.com/v1" # Optional
export DEEPSEEK_MODEL="deepseek-chat" # Optional
# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
export ANTHROPIC_BASE_URL="https://api.anthropic.com/v1" # Optional
export ANTHROPIC_MODEL="claude-3-5-sonnet-20241022" # Optional
Configuration Files
Add provider configurations to config.yml:
llm:
# Default provider if multiple are configured
default_provider: "openai"
openai:
api_key: "${OPENAI_API_KEY}"
base_url: "https://api.openai.com/v1"
model: "gpt-4"
timeout_seconds: 30
deepseek:
api_key: "${DEEPSEEK_API_KEY}"
base_url: "https://api.deepseek.com/v1"
model: "deepseek-chat"
timeout_seconds: 60
anthropic:
api_key: "${ANTHROPIC_API_KEY}"
base_url: "https://api.anthropic.com/v1"
model: "claude-3-5-sonnet-20241022"
timeout_seconds: 30
Programmatic Configuration
OpenAI
#![allow(unused)] fn main() { use paladin::infrastructure::adapters::llm::openai_adapter::OpenAILlmAdapter; use std::time::Duration; let adapter = OpenAILlmAdapter::new( api_key, None, // Use default base URL Some(Duration::from_secs(30)) )?; }
DeepSeek
#![allow(unused)] fn main() { use paladin::infrastructure::adapters::llm::deepseek_adapter::{ DeepSeekAdapter, DeepSeekConfig }; // From environment let config = DeepSeekConfig::from_env()?; let adapter = DeepSeekAdapter::new(config)?; // Or custom let config = DeepSeekConfig::new( api_key, "https://api.deepseek.com/v1".to_string(), "deepseek-chat".to_string() ); let adapter = DeepSeekAdapter::new(config)?; }
Anthropic
#![allow(unused)] fn main() { use paladin::infrastructure::adapters::llm::anthropic_adapter::{ AnthropicAdapter, AnthropicConfig }; // From environment let config = AnthropicConfig::from_env()?; let adapter = AnthropicAdapter::new(config)?; // Or custom let config = AnthropicConfig::new( api_key, "https://api.anthropic.com/v1".to_string(), "claude-3-5-sonnet-20241022".to_string() ); let adapter = AnthropicAdapter::new(config)?; }
Use Case Recommendations
When to Use OpenAI
Best for:
- General-purpose AI applications
- Production deployments requiring proven reliability
- Applications needing vision/image analysis
- Multimodal applications
- Projects with complex tooling requirements
Example Use Cases:
- Customer support chatbots
- Content generation systems
- Image analysis and description
- General AI assistants
- Document Q&A systems
When to Use DeepSeek
Best for:
- Cost-sensitive deployments
- Code generation and analysis
- Logical reasoning tasks
- High-volume batch processing
- Internal development tools
Example Use Cases:
- Code review automation
- Test generation
- Documentation generation
- Internal knowledge bases
- Analytical pipelines
When to Use Anthropic Claude
Best for:
- Safety-critical applications
- Long-document analysis
- Complex reasoning tasks
- Compliance-sensitive domains
- High-stakes decision support
Example Use Cases:
- Legal document analysis
- Medical record processing
- Financial compliance checking
- Research paper analysis
- Complex contract review
Migration Guide
From OpenAI to DeepSeek
DeepSeek uses an OpenAI-compatible API, making migration straightforward:
#![allow(unused)] fn main() { // Before (OpenAI) let llm_port = Arc::new(OpenAILlmAdapter::new( openai_key, None, Some(Duration::from_secs(30)) )?); // After (DeepSeek) let config = DeepSeekConfig::from_env()?; let llm_port = Arc::new(DeepSeekAdapter::new(config)?); // Your Paladin code remains the same let paladin = PaladinBuilder::new(llm_port) .system_prompt("Your prompt") .build()?; }
Considerations:
- DeepSeek has no vision support
- Context window is 64K vs 128K for GPT-4
- Response style may differ slightly
From OpenAI to Anthropic
Anthropic Claude requires some adjustments due to API differences:
#![allow(unused)] fn main() { // Before (OpenAI) let llm_port = Arc::new(OpenAILlmAdapter::new( openai_key, None, Some(Duration::from_secs(30)) )?); // After (Anthropic) let config = AnthropicConfig::from_env()?; let llm_port = Arc::new(AnthropicAdapter::new(config)?); // Your Paladin code remains the same let paladin = PaladinBuilder::new(llm_port) .system_prompt("Your prompt") .build()?; }
Key Differences:
- Claude requires
max_tokensparameter (defaults to 4096) - System messages are sent separately
- Larger context window (200K tokens)
- Different SSE streaming format
Provider Fallback Pattern
Implement graceful fallback for higher reliability:
#![allow(unused)] fn main() { use paladin::paladin_ports::output::llm_port::LlmPort; use std::sync::Arc; fn create_llm_provider() -> Result<Arc<dyn LlmPort>, Box<dyn std::error::Error>> { // Try DeepSeek first (cost-effective) if let Ok(config) = DeepSeekConfig::from_env() { if let Ok(adapter) = DeepSeekAdapter::new(config) { return Ok(Arc::new(adapter)); } } // Fallback to Anthropic (powerful) if let Ok(config) = AnthropicConfig::from_env() { if let Ok(adapter) = AnthropicAdapter::new(config) { return Ok(Arc::new(adapter)); } } // Final fallback to OpenAI (default) let api_key = std::env::var("OPENAI_API_KEY")?; Ok(Arc::new(OpenAILlmAdapter::new( api_key, None, Some(Duration::from_secs(30)) )?)) } }
Performance Characteristics
Latency Comparison (Approximate)
| Provider | First Token (p50) | First Token (p95) | Throughput |
|---|---|---|---|
| OpenAI GPT-4 | 500-800ms | 1-2s | Medium |
| OpenAI GPT-3.5 | 200-400ms | 500ms-1s | High |
| DeepSeek | 300-600ms | 800ms-1.5s | High |
| Anthropic Claude | 400-700ms | 1-2s | Medium |
Note: Actual performance varies based on request size, load, and region
Cost Comparison (Approximate)
Per 1M Tokens (Input/Output):
| Provider | Model | Input | Output |
|---|---|---|---|
| OpenAI | GPT-4 | $10 | $30 |
| OpenAI | GPT-3.5-turbo | $0.50 | $1.50 |
| DeepSeek | deepseek-chat | $0.10 | $0.20 |
| Anthropic | Claude 3.5 Sonnet | $3 | $15 |
Prices are approximate and subject to change
Scaling Considerations
OpenAI:
- Rate limits: Tier-based (requests/min, tokens/min)
- Horizontal scaling: Good
- Burst capacity: Moderate
DeepSeek:
- Rate limits: Generous
- Horizontal scaling: Excellent (high throughput)
- Burst capacity: High
Anthropic:
- Rate limits: Tier-based
- Horizontal scaling: Good
- Burst capacity: Moderate
Best Practices
1. Use Provider Capabilities
Query provider capabilities before attempting operations:
#![allow(unused)] fn main() { let caps = provider.get_capabilities(); if caps.supports_vision { // Send image-based requests } if caps.supports_streaming { // Use streaming for better UX } }
2. Set Appropriate Timeouts
Different providers may have different response times:
#![allow(unused)] fn main() { // Higher timeout for Claude with long contexts let claude_config = AnthropicConfig::new(/* ... */); // Timeout handled internally // Standard timeout for others let openai = OpenAILlmAdapter::new( api_key, None, Some(Duration::from_secs(30)) )?; }
3. Handle Provider-Specific Errors
#![allow(unused)] fn main() { match provider.generate(&request).await { Ok(response) => // Handle response, Err(LlmError::RateLimitExceeded { retry_after }) => { tokio::time::sleep(Duration::from_secs(retry_after)).await; // Retry } Err(LlmError::AuthenticationError(_)) => { // Check API keys } Err(e) => // Handle other errors } }
4. Monitor Usage and Costs
#![allow(unused)] fn main() { let response = provider.generate(&request).await?; // Log token usage println!("Input tokens: {}", response.usage.prompt_tokens); println!("Output tokens: {}", response.usage.completion_tokens); println!("Total cost: ${}", calculate_cost(&response, provider_name)); }
Troubleshooting
Authentication Errors
Issue: LlmError::AuthenticationError
Solutions:
- Verify API key is set correctly
- Check API key has necessary permissions
- Ensure API key hasn't expired
- Verify base URL is correct for your region
Rate Limiting
Issue: LlmError::RateLimitExceeded
Solutions:
- Implement exponential backoff (built-in to adapters)
- Consider upgrading API tier
- Implement request queuing
- Switch to provider with higher limits
Timeout Errors
Issue: LlmError::Timeout
Solutions:
- Increase timeout duration
- Reduce request complexity
- Check network connectivity
- Consider switching to streaming mode
Context Length Errors
Issue: LlmError::InvalidRequest (context too long)
Solutions:
- Reduce input size
- Switch to provider with larger context (Claude: 200K)
- Implement context windowing
- Summarize older conversation history
Additional Resources
- Paladin Examples - Working code examples
- Contributing Providers Guide - Add new providers
- API Documentation - Full API reference
- GitHub Issues - Report issues
Last Updated: January 2026
Version: 0.1.0