Agent skill

litellm

When calling LLM APIs from Python code. When connecting to llamafile or local LLM servers. When switching between OpenAI/Anthropic/local providers. When implementing retry/fallback logic for LLM calls. When code imports litellm or uses completion() patterns.

View SKILL.md on GitHub Repository

Stars 33

Forks 4

Install this agent skill to your Project

npx add-skill https://github.com/Jamie-BitFlight/claude_skills/tree/main/plugins/litellm/skills/litellm

SKILL.md

LiteLLM

Unified Python interface for calling 100+ LLM APIs using consistent OpenAI format. Provides standardized exception handling, retry/fallback logic, and cost tracking across multiple providers.

When to Use This Skill

Use this skill when:

Integrating with multiple LLM providers through a single interface
Routing requests to local llamafile servers using OpenAI-compatible endpoints
Implementing retry and fallback logic for LLM calls
Building applications requiring consistent error handling across providers
Tracking LLM usage costs across different providers
Converting between provider-specific APIs and OpenAI format
Deploying LLM proxy servers with unified configuration
Testing applications against both cloud and local LLM endpoints

Core Capabilities

Provider Support

LiteLLM supports 100+ providers through consistent OpenAI-style API:

Cloud Providers: OpenAI, Anthropic, Google, Azure, AWS Bedrock
Local Servers: llamafile, Ollama, LocalAI, vLLM
Unified Format: All requests use OpenAI message format
Exception Mapping: All provider errors map to OpenAI exception types

Key Features

Unified API: Single completion() function for all providers
Exception Handling: All exceptions inherit from OpenAI types
Retry Logic: Built-in retry with configurable attempts
Streaming Support: Sync and async streaming for all providers
Cost Tracking: Automatic usage and cost calculation
Proxy Mode: Deploy centralized LLM gateway

Installation

bash

# Using pip
pip install litellm

# Using uv
uv add litellm

Llamafile Integration

Provider Configuration

All llamafile models MUST use the llamafile/ prefix for routing:

python

model = "llamafile/mistralai/mistral-7b-instruct-v0.2"
model = "llamafile/gemma-3-3b"

API Base URL

The api_base MUST point to llamafile's OpenAI-compatible endpoint:

python

api_base = "http://localhost:8080/v1"

Critical Requirements:

Include /v1 suffix
Do NOT add endpoint paths like /chat/completions (LiteLLM adds these automatically)
Default llamafile port is 8080

Environment Variable Configuration

python

import os

os.environ["LLAMAFILE_API_BASE"] = "http://localhost:8080/v1"

Basic Usage Patterns

Synchronous Completion

python

import litellm

response = litellm.completion(
    model="llamafile/mistralai/mistral-7b-instruct-v0.2",
    messages=[{"role": "user", "content": "Summarize this diff"}],
    api_base="http://localhost:8080/v1",
    temperature=0.2,
    max_tokens=80,
)

print(response.choices[0].message.content)

Asynchronous Completion

python

from litellm import acompletion
import asyncio

async def generate_message():
    response = await acompletion(
        model="llamafile/gemma-3-3b",
        messages=[{"role": "user", "content": "Write a commit message"}],
        api_base="http://localhost:8080/v1",
        temperature=0.3,
        max_tokens=200,
    )
    return response.choices[0].message.content

result = asyncio.run(generate_message())
print(result)

Async Streaming

python

from litellm import acompletion
import asyncio

async def stream_response():
    response = await acompletion(
        model="llamafile/gemma-3-3b",
        messages=[{"role": "user", "content": "Hello, how are you?"}],
        api_base="http://localhost:8080/v1",
        stream=True,
    )

    async for chunk in response:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()

asyncio.run(stream_response())

Embeddings

python

from litellm import embedding
import os

os.environ["LLAMAFILE_API_BASE"] = "http://localhost:8080/v1"

response = embedding(
    model="llamafile/sentence-transformers/all-MiniLM-L6-v2",
    input=["Hello world"],
)

print(response)

Exception Handling

Import Pattern

All exceptions can be imported directly from litellm:

python

from litellm import (
    BadRequestError,           # 400 errors
    AuthenticationError,       # 401 errors
    NotFoundError,             # 404 errors
    Timeout,                   # 408 errors (alias: openai.APITimeoutError)
    RateLimitError,            # 429 errors
    APIConnectionError,        # 500 errors / connection issues (default)
    ServiceUnavailableError,   # 503 errors
)

Exception Types Reference

Status Code	Exception Type	Inherits from	Description
400	`BadRequestError`	openai.BadRequestError	Invalid request
400	`ContextWindowExceededError`	litellm.BadRequestError	Token limit exceeded
400	`ContentPolicyViolationError`	litellm.BadRequestError	Content policy violation
401	`AuthenticationError`	openai.AuthenticationError	Auth failure
403	`PermissionDeniedError`	openai.PermissionDeniedError	Permission denied
404	`NotFoundError`	openai.NotFoundError	Invalid model/endpoint
408	`Timeout`	openai.APITimeoutError	Request timeout
429	`RateLimitError`	openai.RateLimitError	Rate limited
500	`APIConnectionError`	openai.APIConnectionError	Default for unmapped errors
500	`APIError`	openai.APIError	Generic 500 error
503	`ServiceUnavailableError`	openai.APIStatusError	Service unavailable
>=500	`InternalServerError`	openai.InternalServerError	Unmapped 500+ errors

Exception Attributes

All LiteLLM exceptions include:

status_code: HTTP status code
message: Error message
llm_provider: Provider that raised the exception

Exception Handling Example

python

import litellm
import openai

try:
    response = litellm.completion(
        model="llamafile/gemma-3-3b",
        messages=[{"role": "user", "content": "Hello"}],
        api_base="http://localhost:8080/v1",
        timeout=30.0,
    )
except openai.APITimeoutError as e:
    # LiteLLM exceptions inherit from OpenAI types
    print(f"Timeout: {e}")
except litellm.APIConnectionError as e:
    print(f"Connection failed: {e.message}")
    print(f"Provider: {e.llm_provider}")

Alternative Import from litellm.exceptions

python

from litellm.exceptions import BadRequestError, AuthenticationError, APIError

try:
    response = litellm.completion(
        model="llamafile/gemma-3-3b",
        messages=[{"role": "user", "content": "Hello"}],
        api_base="http://localhost:8080/v1",
    )
except AuthenticationError as e:
    print(f"Authentication failed: {e}")
except BadRequestError as e:
    print(f"Bad request: {e}")
except APIError as e:
    print(f"API error: {e}")

Checking If Exception Should Retry

python

import litellm

try:
    response = litellm.completion(
        model="llamafile/gemma-3-3b",
        messages=[{"role": "user", "content": "Hello"}],
        api_base="http://localhost:8080/v1",
    )
except Exception as e:
    if hasattr(e, 'status_code'):
        should_retry = litellm._should_retry(e.status_code)
        print(f"Should retry: {should_retry}")

Retry and Fallback Configuration

python

from litellm import completion

response = completion(
    model="llamafile/gemma-3-3b",
    messages=[{"role": "user", "content": "Hello"}],
    api_base="http://localhost:8080/v1",
    num_retries=3,      # Retry 3 times on failure
    timeout=30.0,       # 30 second timeout
)

Proxy Server Configuration

For proxy deployments, use config.yaml:

yaml

model_list:
  - model_name: commit-polish-model
    litellm_params:
      model: llamafile/gemma-3-3b          # add llamafile/ prefix
      api_base: http://localhost:8080/v1   # add api base for OpenAI compatible provider

Application Integration Patterns

Connection Verification Pattern

python

import litellm
from litellm import APIConnectionError

def verify_llamafile_connection(api_base: str = "http://localhost:8080/v1") -> bool:
    """Check if llamafile server is running."""
    try:
        litellm.completion(
            model="llamafile/test",
            messages=[{"role": "user", "content": "test"}],
            api_base=api_base,
            max_tokens=1,
        )
        return True
    except APIConnectionError:
        return False

Async Service Pattern

python

import litellm
from litellm import acompletion, APIConnectionError
import asyncio

class AIService:
    """LiteLLM wrapper with llamafile routing."""

    def __init__(self, model: str, api_base: str, temperature: float = 0.3, max_tokens: int = 200):
        self.model = model
        self.api_base = api_base
        self.temperature = temperature
        self.max_tokens = max_tokens

    async def generate_commit_message(self, diff: str, system_prompt: str) -> str:
        """Generate a commit message using the LLM."""
        try:
            response = await acompletion(
                model=self.model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": f"Generate a commit message for this diff:\n\n{diff}"},
                ],
                api_base=self.api_base,
                temperature=self.temperature,
                max_tokens=self.max_tokens,
            )
            return response.choices[0].message.content.strip()
        except APIConnectionError as e:
            raise RuntimeError(f"Failed to connect to llamafile server at {self.api_base}: {e.message}")

Common Pitfalls to Avoid

Missing llamafile/ prefix: Without prefix, LiteLLM won't route to OpenAI-compatible endpoint
Wrong port: Llamafile uses 8080 by default, not 8000
Missing /v1 suffix: API base must end with /v1
Adding extra path segments: Do NOT use http://localhost:8080/v1/chat/completions - LiteLLM adds the endpoint path automatically
API key requirement: No API key needed for local llamafile (use empty string or any value if required by validation)

Configuration Examples

TOML Configuration

toml

# ~/.config/commit-polish/config.toml
[ai]
model = "llamafile/gemma-3-3b"  # MUST have llamafile/ prefix
temperature = 0.3
max_tokens = 200

Environment Variables

bash

export LLAMAFILE_API_BASE="http://localhost:8080/v1"
export LITELLM_LOG="INFO"  # Enable LiteLLM debug logging

Related Skills

For comprehensive documentation on related tools:

llamafile: Activate the llamafile skill using Skill(command: "llamafile:llamafile") for llamafile server setup, model management, and local LLM deployment patterns
uv: Activate the uv skill using Skill(command: "python3-development:uv") for Python project management, dependency handling, and virtual environment workflows

References

Official Documentation

LiteLLM Documentation - Main documentation portal
Llamafile Provider Docs - Llamafile-specific configuration
Exception Mapping - Complete exception reference
GitHub Repository - Source code and examples

Provider-Specific Documentation

Llamafile API Endpoints - Llamafile OpenAI-compatible API reference
Completion Streaming - Streaming implementation guide

Version Information

Documentation verified against: LiteLLM GitHub repository (main branch, accessed 2025-01-15)
Python: 3.11+
Llamafile: 0.9.3+

Maintainer

Jamie-BitFlight Core maintainer

Source details

Full Name: Jamie-BitFlight/claude_skills
Branch: main
Path in repo: plugins/litellm/skills/litellm

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

Jamie-BitFlight/claude_skills

ccc

This skill should be used when code search is needed (whether explicitly requested or as part of completing a task), when indexing the codebase after changes, or when the user asks about ccc, cocoindex-code, or the codebase index. Trigger phrases include 'search the codebase', 'find code related to', 'update the index', 'ccc', 'cocoindex-code'.

33 4

Explore

Jamie-BitFlight/claude_skills

agent-browser

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

33 4

Explore

Jamie-BitFlight/claude_skills

delegate

Quick delegation template for sub-agent prompts. Use when assigning work to a sub-agent, before invoking the Agent tool, or when preparing prompts for specialized agents. Provides the WHERE-WHAT-WHY framework. For comprehensive delegation guidance, activate the agent-orchestration how-to-delegate skill.

33 4

Explore

Jamie-BitFlight/claude_skills

swarm-spawning

Spawn agents and teammates in Claude Code swarms. Use when choosing between subagents vs teammates, selecting agent types (Explore, Plan, general-purpose, plugin agents), configuring spawn backends (in-process, tmux, iterm2), or setting environment variables for spawned agents.

33 4

Explore

Jamie-BitFlight/claude_skills

knowledge-explorer

Manage the research/ knowledge base (KB) of tool and library research entries. Use when browsing KB topics, adding new research entries, updating existing entries with dated revisions, fetching GitHub repo metadata into a draft KB entry, or migrating old-format entries to skill-spec frontmatter. Triggers on tasks like "what do we have on X", "add this to the KB", "update the KB entry for Y", "fetch github info for owner/repo", or "migrate old entries".

33 4

Explore

Jamie-BitFlight/claude_skills

design-anti-patterns

Enforce anti-AI UI design rules based on the Uncodixfy methodology. Use when generating HTML, CSS, React, Vue, Svelte, or any frontend UI code. Prevents "Codex UI" — the generic AI aesthetic of soft gradients, floating panels, oversized rounded corners, glassmorphism, hero sections in dashboards, and decorative copy. Applies constraints from Linear/Raycast/Stripe/GitHub design philosophy: functional, honest, human-designed interfaces. Triggers on: UI generation, dashboard building, frontend component creation, CSS styling, landing page design, or any task producing visual interface code.

33 4

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

LiteLLM

When to Use This Skill

Core Capabilities

Provider Support

Key Features

Installation

Llamafile Integration

Provider Configuration

API Base URL

Environment Variable Configuration

Basic Usage Patterns

Synchronous Completion

Asynchronous Completion

Async Streaming

Embeddings

Exception Handling

Import Pattern

Exception Types Reference

Exception Attributes

Exception Handling Example

Alternative Import from litellm.exceptions

Checking If Exception Should Retry

Retry and Fallback Configuration

Proxy Server Configuration

Application Integration Patterns

Connection Verification Pattern

Async Service Pattern

Common Pitfalls to Avoid

Configuration Examples

TOML Configuration

Environment Variables

Related Skills

References

Official Documentation

Provider-Specific Documentation

Version Information

Recommended Agent Skills

ccc

agent-browser

delegate

swarm-spawning

knowledge-explorer

design-anti-patterns