Agent skill
providers
Use when switching between LLM providers, accessing provider-specific features (Anthropic caching, OpenAI logprobs), or using raw SDK clients - covers multi-provider patterns and direct SDK access for OpenAI, Anthropic, Google, and Ollama
Install this agent skill to your Project
npx add-skill https://github.com/juanre/llmring/tree/main/skills/providers
SKILL.md
Multi-Provider Patterns and Raw SDK Access
Installation
# With uv (recommended)
uv add llmring
# With pip
pip install llmring
Provider SDKs (install what you need):
uv add openai>=1.0 # OpenAI
uv add anthropic>=0.67 # Anthropic
uv add google-genai # Google Gemini
uv add ollama>=0.4 # Ollama
API Overview
This skill covers:
get_provider()method for raw SDK access- Provider initialization and configuration
- Provider-specific features (caching, extra parameters)
- Multi-provider patterns and switching
- Fallback behavior
Quick Start
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
# Get raw provider client
openai_client = service.get_provider("openai").client
anthropic_client = service.get_provider("anthropic").client
# Use provider SDK directly
response = await openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
logprobs=True # Provider-specific feature
)
Complete API Documentation
LLMRing.get_provider()
Get raw provider client for direct SDK access.
Signature:
def get_provider(provider_type: str) -> BaseLLMProvider
Parameters:
provider_type(str): Provider name - "openai", "anthropic", "google", or "ollama"
Returns:
BaseLLMProvider: Provider wrapper with.clientattribute for raw SDK
Raises:
ProviderNotFoundError: If provider not configured or API key missing
Example:
from llmring import LLMRing
async with LLMRing() as service:
# Get providers
openai_provider = service.get_provider("openai")
anthropic_provider = service.get_provider("anthropic")
# Access raw clients
openai_client = openai_provider.client # openai.AsyncOpenAI
anthropic_client = anthropic_provider.client # anthropic.AsyncAnthropic
Provider Clients
Each provider exposes its native SDK client:
OpenAI:
provider = service.get_provider("openai")
client = provider.client # openai.AsyncOpenAI instance
Anthropic:
provider = service.get_provider("anthropic")
client = provider.client # anthropic.AsyncAnthropic instance
Google:
provider = service.get_provider("google")
client = provider.client # google.genai.Client instance
Ollama:
provider = service.get_provider("ollama")
client = provider.client # ollama.AsyncClient instance
Provider Initialization
Providers are automatically initialized based on environment variables:
Environment Variables:
# OpenAI
OPENAI_API_KEY=sk-...
# Anthropic
ANTHROPIC_API_KEY=sk-ant-...
# Google (any of these)
GOOGLE_GEMINI_API_KEY=AIza...
GEMINI_API_KEY=AIza...
GOOGLE_API_KEY=AIza...
# Ollama (optional, default shown)
OLLAMA_BASE_URL=http://localhost:11434
What gets initialized:
- OpenAI: If
OPENAI_API_KEYis set - Anthropic: If
ANTHROPIC_API_KEYis set - Google: If any Google API key is set
- Ollama: Always (local, no key needed)
Provider-Specific Features
OpenAI: Logprobs and Advanced Parameters
from llmring import LLMRing
async with LLMRing() as service:
openai_client = service.get_provider("openai").client
# Use OpenAI-specific features
response = await openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
logprobs=True, # Token probabilities
top_logprobs=5, # Top 5 alternatives
seed=12345, # Deterministic sampling
presence_penalty=0.1, # Reduce repetition
frequency_penalty=0.2, # Reduce frequency
parallel_tool_calls=False # Sequential tools
)
# Access logprobs
if response.choices[0].logprobs:
for token_info in response.choices[0].logprobs.content:
print(f"Token: {token_info.token}, prob: {token_info.logprob}")
OpenAI: Reasoning Models (o1 series)
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
# Using unified API
request = LLMRequest(
model="openai:o1",
messages=[Message(role="user", content="Complex reasoning task")],
reasoning_tokens=10000 # Budget for internal reasoning
)
response = await service.chat(request)
# Or use raw SDK
openai_client = service.get_provider("openai").client
response = await openai_client.chat.completions.create(
model="o1",
messages=[{"role": "user", "content": "Reasoning task"}],
max_completion_tokens=5000 # Includes reasoning + output tokens
)
Anthropic: Prompt Caching
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
# Using unified API
request = LLMRequest(
model="anthropic:claude-sonnet-4-5-20250929",
messages=[
Message(
role="system",
content="Very long system prompt with 1024+ tokens...",
metadata={"cache_control": {"type": "ephemeral"}}
),
Message(role="user", content="Hello")
]
)
response = await service.chat(request)
# Or use raw SDK
anthropic_client = service.get_provider("anthropic").client
response = await anthropic_client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=100,
system=[{
"type": "text",
"text": "Long system prompt...",
"cache_control": {"type": "ephemeral"}
}],
messages=[{"role": "user", "content": "Hello"}]
)
# Check cache usage
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")
Anthropic: Extended Thinking
Extended thinking can be enabled via extra_params:
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
# Using unified API with extra_params
request = LLMRequest(
model="anthropic:claude-sonnet-4-5-20250929",
messages=[Message(role="user", content="Complex reasoning problem...")],
max_tokens=16000,
extra_params={
"thinking": {
"type": "enabled",
"budget_tokens": 10000
}
}
)
response = await service.chat(request)
# Response may contain thinking content (check response structure)
# Or use raw SDK for full control
async with LLMRing() as service:
anthropic_client = service.get_provider("anthropic").client
response = await anthropic_client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000
},
messages=[{
"role": "user",
"content": "Complex reasoning problem..."
}]
)
# Access thinking content
for block in response.content:
if block.type == "thinking":
print(f"Thinking: {block.thinking}")
elif block.type == "text":
print(f"Response: {block.text}")
Note: The unified API's reasoning_tokens parameter is for OpenAI reasoning models (o1, o3). For Anthropic extended thinking, use extra_params as shown above.
Google: Large Context and Multimodal
from llmring import LLMRing
async with LLMRing() as service:
google_client = service.get_provider("google").client
# Use 2M+ token context
response = google_client.models.generate_content(
model="gemini-2.5-pro",
contents="Very long document with millions of tokens...",
generation_config={
"temperature": 0.7,
"top_p": 0.8,
"top_k": 40,
"candidate_count": 1,
"max_output_tokens": 8192
}
)
# Multimodal (vision)
from PIL import Image
img = Image.open("image.jpg")
response = google_client.models.generate_content(
model="gemini-2.5-flash",
contents=["What's in this image?", img]
)
Ollama: Local Models and Custom Options
from llmring import LLMRing
async with LLMRing() as service:
ollama_client = service.get_provider("ollama").client
# Use local model with custom options
response = await ollama_client.chat(
model="llama3",
messages=[{"role": "user", "content": "Hello"}],
options={
"temperature": 0.8,
"top_k": 40,
"top_p": 0.9,
"num_predict": 256,
"num_ctx": 4096,
"repeat_penalty": 1.1
}
)
# List available local models
models = await ollama_client.list()
for model in models["models"]:
print(f"Model: {model['name']}, Size: {model['size']}")
Using extra_params
For provider-specific parameters via unified API:
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
# Pass provider-specific params
request = LLMRequest(
model="openai:gpt-4o",
messages=[Message(role="user", content="Hello")],
extra_params={
"logprobs": True,
"top_logprobs": 5,
"seed": 12345,
"presence_penalty": 0.1
}
)
response = await service.chat(request)
Multi-Provider Patterns
Provider Switching
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
# Same request, different providers
messages = [Message(role="user", content="Hello")]
# OpenAI
response = await service.chat(
LLMRequest(model="openai:gpt-4o", messages=messages)
)
# Anthropic
response = await service.chat(
LLMRequest(model="anthropic:claude-sonnet-4-5-20250929", messages=messages)
)
# Google
response = await service.chat(
LLMRequest(model="google:gemini-2.5-pro", messages=messages)
)
# Ollama
response = await service.chat(
LLMRequest(model="ollama:llama3", messages=messages)
)
Automatic Fallback
Use lockfile for automatic provider failover:
# llmring.lock
[[profiles.default.bindings]]
alias = "reliable"
models = [
"anthropic:claude-sonnet-4-5-20250929", # Try first
"openai:gpt-4o", # If rate limited
"google:gemini-2.5-pro", # If both fail
"ollama:llama3" # Local fallback
]
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
# Automatically tries fallbacks on failure
request = LLMRequest(
model="reliable", # Uses fallback chain
messages=[Message(role="user", content="Hello")]
)
response = await service.chat(request)
print(f"Used model: {response.model}")
Cost Optimization: Try Cheaper First
from llmring import LLMRing, LLMRequest, Message
async with LLMRing() as service:
messages = [Message(role="user", content="Simple task")]
# Try cheap model first
try:
response = await service.chat(
LLMRequest(model="openai:gpt-4o-mini", messages=messages)
)
except Exception as e:
# Fall back to more capable model
response = await service.chat(
LLMRequest(model="anthropic:claude-sonnet-4-5-20250929", messages=messages)
)
Provider-Specific Error Handling
from llmring import LLMRing, LLMRequest, Message
from llmring.exceptions import (
ProviderRateLimitError,
ProviderAuthenticationError,
ModelNotFoundError
)
async with LLMRing() as service:
try:
request = LLMRequest(
model="anthropic:claude-sonnet-4-5-20250929",
messages=[Message(role="user", content="Hello")]
)
response = await service.chat(request)
except ProviderRateLimitError as e:
print(f"Rate limited, retry after {e.retry_after}s")
# Try different provider
request.model = "openai:gpt-4o"
response = await service.chat(request)
except ProviderAuthenticationError:
print("Invalid API key")
except ModelNotFoundError:
print("Model not available")
Provider Comparison
| Provider | Strengths | Limitations | Best For |
|---|---|---|---|
| OpenAI | Fast, reliable, reasoning models (o1) | Rate limits, cost | General purpose, reasoning |
| Anthropic | Large context, prompt caching, extended thinking | Availability varies by region | Complex tasks, large docs |
| 2M+ context, multimodal, fast | Newer, less documentation | Large context, vision | |
| Ollama | Local, free, privacy | Requires local setup, slower | Development, privacy |
When to Use Raw SDK Access
Use unified LLMRing API when:
- Switching between providers
- Using aliases and profiles
- Standard chat/streaming/tools
- Want provider abstraction
Use raw SDK access when:
- Need provider-specific features not in unified API
- Performance-critical applications
- Complex provider-specific configurations
- Vendor-specific optimizations
Common Mistakes
Wrong: Not Checking Provider Availability
# DON'T DO THIS - provider may not be configured
provider = service.get_provider("anthropic")
client = provider.client # May error if no API key!
Right: Check Provider Availability
# DO THIS - handle missing providers
from llmring.exceptions import ProviderNotFoundError
try:
provider = service.get_provider("anthropic")
client = provider.client
except ProviderNotFoundError:
print("Anthropic not configured - check ANTHROPIC_API_KEY")
Wrong: Hardcoding Provider
# DON'T DO THIS - locked to one provider
request = LLMRequest(
model="openai:gpt-4o",
messages=[...]
)
Right: Use Alias for Flexibility
# DO THIS - easy to switch providers
request = LLMRequest(
model="assistant", # Your semantic alias defined in lockfile
messages=[...]
)
Wrong: Ignoring Provider-Specific Errors
# DON'T DO THIS - generic error handling
try:
response = await service.chat(request)
except Exception as e:
print(f"Error: {e}")
Right: Handle Provider-Specific Errors
# DO THIS - specific error types
from llmring.exceptions import (
ProviderRateLimitError,
ProviderTimeoutError
)
try:
response = await service.chat(request)
except ProviderRateLimitError as e:
# Try different provider
request.model = "google:gemini-2.5-pro"
response = await service.chat(request)
except ProviderTimeoutError:
# Retry or use different provider
pass
Best Practices
- Use aliases for flexibility: Don't hardcode provider:model references
- Configure fallbacks: Multiple providers in lockfile for high availability
- Check provider availability: Handle
ProviderNotFoundError - Use unified API when possible: Only drop to raw SDK when needed
- Handle provider-specific errors: Different providers have different failure modes
- Test with multiple providers: Ensure your code works across providers
- Document provider choices: Explain why you chose specific providers
Checking Available Providers
from llmring import LLMRing
async with LLMRing() as service:
# Check which providers are configured
providers = []
for provider_name in ["openai", "anthropic", "google", "ollama"]:
try:
service.get_provider(provider_name)
providers.append(provider_name)
except:
pass
print(f"Available providers: {', '.join(providers)}")
Related Skills
llmring-chat- Basic chat with unified APIllmring-streaming- Streaming across providersllmring-tools- Tools with different providersllmring-structured- Structured output across providersllmring-lockfile- Configure provider aliases and fallbacks
Summary
Multi-provider patterns enable:
- High availability (automatic failover)
- Cost optimization (try cheaper first)
- Provider diversity (avoid vendor lock-in)
- Feature access (use best provider for each task)
Raw SDK access provides:
- Provider-specific features (logprobs, caching, etc.)
- Performance optimizations
- Advanced configurations
- Direct vendor SDK control
Recommendation: Use unified API with aliases for most work. Drop to raw SDK only when you need provider-specific features.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
tools
Use when implementing function calling, tool use, or agents with LLMs - unified tool API works across OpenAI, Anthropic, Google, and Ollama with consistent tool definition and execution patterns
streaming
Use when building real-time chat interfaces, displaying incremental LLM responses, or streaming output from OpenAI, Anthropic, Google, or Ollama - async iteration with usage tracking works across all providers
lockfile
Use when creating llmring.lock file for new project (REQUIRED for all applications), configuring model aliases with semantic task-based names, managing environment-specific profiles (dev/staging/prod), or setting up fallback models - lockfile creation is mandatory first step, bundled lockfile is only for llmring tools
chat
Use when starting a new project with llmring, building an application using LLMs, making basic chat completions, or sending messages to OpenAI, Anthropic, Google, or Ollama - covers lockfile creation (MANDATORY first step), semantic alias usage, unified interface for all providers with consistent message structure and response handling
structured-output
Use when extracting structured data from LLMs, parsing JSON responses, or enforcing output schemas - unified JSON schema API works across OpenAI, Anthropic, Google, and Ollama with automatic validation and parsing
edit-article
Edit and improve articles by restructuring sections, improving clarity, and tightening prose. Use when user wants to edit, revise, or improve an article draft.
Didn't find tool you were looking for?