Agent skill

llm-cost-optimization

Reduce LLM API costs without sacrificing quality. Covers prompt caching (Anthropic), local response caching, prompt compression, debouncing triggers, and cost analysis. Use when building LLM-powered features, analyzing API costs, optimizing prompts, or implementing caching strategies.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/llm-cost-optimization

SKILL.md

LLM Cost Optimization

Practical techniques to reduce LLM API costs by 35-65%.

Quick Reference

Technique Savings When to Use Reference
Prompt Caching 25-45% Same system prompt, frequent calls caching.md
Response Cache 100% Repeated identical requests caching.md
Prompt Compression 10-20% Long system prompts prompts.md
Debouncing 50%+ Duplicate triggers triggers.md

The 80/20 of LLM Costs

For short user inputs, system prompts dominate costs:

Text Length Input Tokens System Prompt %
Short (~100 chars) ~250 80-87%
Medium (~500 chars) ~450 44%
Long (~2000 chars) ~900 22%

Optimization priority:

  1. Cache system prompts (biggest impact)
  2. Cache identical requests (free repeats)
  3. Debounce triggers (prevent waste)
  4. Compress prompts (last resort)

Cost Estimation (Claude Haiku 3.5)

Text Length Est. Cost
Short (~100 chars) ~$0.0004
Medium (~500 chars) ~$0.0008
Long (~2000 chars) ~$0.002

Benchmark: 1000 translations ≈ $0.80 (before optimization)

Implementation Checklist

Before Building

  • Add logging to every AI trigger point
  • Verify triggers fire exactly once per user action
  • Check for Pressed/Released event duplicates

Caching Strategy

  • Enable Anthropic Prompt Caching for system prompts
  • Implement local response cache (hash-based)
  • Include model name in cache key
  • Set reasonable cache limits (e.g., 500 entries LRU)

Prompt Design

  • Measure current token count
  • Identify critical rules (security, output format)
  • Test quality after compression
  • Document WHY for each rule kept

Common Mistakes

Mistake Impact Fix
Trigger fires twice 2x cost Check event.state
No prompt caching Full price every call Use cache_control
Aggressive prompt compression Quality drops Keep critical rules
Cache key missing model Wrong results Include model in key

Quick Wins

1. Check for Duplicate Triggers

rust
// Before ANY optimization, verify this
log::info!("AI trigger fired: {:?}", event);
if event.state != ShortcutState::Pressed {
    return;  // Ignore Released events
}

2. Enable Prompt Caching (Anthropic)

rust
let system = vec![SystemBlock {
    block_type: "text".to_string(),
    text: system_prompt,
    cache_control: CacheControl { cache_type: "ephemeral".to_string() },
}];

3. Add Response Cache

rust
// Check cache before API call
if let Some(cached) = get_cached(&text, &model) {
    return Ok(cached);  // Free!
}

// Save after API call
save_to_cache(&text, &result, &model)?;

Anti-Patterns

  • TOON format for plain text - Only helps with structured data
  • Caching without model key - Haiku vs Sonnet give different results
  • Prompt compression first - Optimize triggers and caching before touching prompts

Didn't find tool you were looking for?

Be as detailed as possible for better results