Agent skills
llm-cost-optimization

Agent skill

llm-cost-optimization

Reduce LLM API costs without sacrificing quality. Covers prompt caching (Anthropic), local response caching, prompt compression, debouncing triggers, and cost analysis. Use when building LLM-powered features, analyzing API costs, optimizing prompts, or implementing caching strategies.

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/llm-cost-optimization

SKILL.md

LLM Cost Optimization

Practical techniques to reduce LLM API costs by 35-65%.

Quick Reference

Technique	Savings	When to Use	Reference
Prompt Caching	25-45%	Same system prompt, frequent calls	caching.md
Response Cache	100%	Repeated identical requests	caching.md
Prompt Compression	10-20%	Long system prompts	prompts.md
Debouncing	50%+	Duplicate triggers	triggers.md

The 80/20 of LLM Costs

For short user inputs, system prompts dominate costs:

Text Length	Input Tokens	System Prompt %
Short (~100 chars)	~250	80-87%
Medium (~500 chars)	~450	44%
Long (~2000 chars)	~900	22%

Optimization priority:

Cache system prompts (biggest impact)
Cache identical requests (free repeats)
Debounce triggers (prevent waste)
Compress prompts (last resort)

Cost Estimation (Claude Haiku 3.5)

Text Length	Est. Cost
Short (~100 chars)	~$0.0004
Medium (~500 chars)	~$0.0008
Long (~2000 chars)	~$0.002

Benchmark: 1000 translations ≈ $0.80 (before optimization)

Implementation Checklist

Before Building

Add logging to every AI trigger point
Verify triggers fire exactly once per user action
Check for Pressed/Released event duplicates

Caching Strategy

Enable Anthropic Prompt Caching for system prompts
Implement local response cache (hash-based)
Include model name in cache key
Set reasonable cache limits (e.g., 500 entries LRU)

Prompt Design

Measure current token count
Identify critical rules (security, output format)
Test quality after compression
Document WHY for each rule kept

Common Mistakes

Mistake	Impact	Fix
Trigger fires twice	2x cost	Check event.state
No prompt caching	Full price every call	Use cache_control
Aggressive prompt compression	Quality drops	Keep critical rules
Cache key missing model	Wrong results	Include model in key

Quick Wins

1. Check for Duplicate Triggers

rust

// Before ANY optimization, verify this
log::info!("AI trigger fired: {:?}", event);
if event.state != ShortcutState::Pressed {
    return;  // Ignore Released events
}

2. Enable Prompt Caching (Anthropic)

rust

let system = vec![SystemBlock {
    block_type: "text".to_string(),
    text: system_prompt,
    cache_control: CacheControl { cache_type: "ephemeral".to_string() },
}];

3. Add Response Cache

rust

// Check cache before API call
if let Some(cached) = get_cached(&text, &model) {
    return Ok(cached);  // Free!
}

// Save after API call
save_to_cache(&text, &result, &model)?;

Anti-Patterns

TOON format for plain text - Only helps with structured data
Caching without model key - Haiku vs Sonnet give different results
Prompt compression first - Optimize triggers and caching before touching prompts

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/development/llm-cost-optimization
License: MIT License

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

LLM Cost Optimization

Quick Reference

The 80/20 of LLM Costs

Cost Estimation (Claude Haiku 3.5)

Implementation Checklist

Before Building

Caching Strategy

Prompt Design

Common Mistakes

Quick Wins

1. Check for Duplicate Triggers

2. Enable Prompt Caching (Anthropic)

3. Add Response Cache

Anti-Patterns