Agent skill
m10-performance
CRITICAL: Use for performance optimization. Triggers: performance, optimization, benchmark, profiling, flamegraph, criterion, slow, fast, allocation, cache, SIMD, make it faster, 性能优化, 基准测试
Install this agent skill to your Project
npx add-skill https://github.com/actionbook/rust-skills/tree/main/skills/m10-performance
SKILL.md
Performance Optimization
Layer 2: Design Choices
Core Question
What's the bottleneck, and is optimization worth it?
Before optimizing:
- Have you measured? (Don't guess)
- What's the acceptable performance?
- Will optimization add complexity?
Performance Decision → Implementation
| Goal | Design Choice | Implementation |
|---|---|---|
| Reduce allocations | Pre-allocate, reuse | with_capacity, object pools |
| Improve cache | Contiguous data | Vec, SmallVec |
| Parallelize | Data parallelism | rayon, threads |
| Avoid copies | Zero-copy | References, Cow<T> |
| Reduce indirection | Inline data | smallvec, arrays |
Thinking Prompt
Before optimizing:
-
Have you measured?
- Profile first → flamegraph, perf
- Benchmark → criterion, cargo bench
- Identify actual hotspots
-
What's the priority?
- Algorithm (10x-1000x improvement)
- Data structure (2x-10x)
- Allocation (2x-5x)
- Cache (1.5x-3x)
-
What's the trade-off?
- Complexity vs speed
- Memory vs CPU
- Latency vs throughput
Trace Up ↑
To domain constraints (Layer 3):
"How fast does this need to be?"
↑ Ask: What's the performance SLA?
↑ Check: domain-* (latency requirements)
↑ Check: Business requirements (acceptable response time)
| Question | Trace To | Ask |
|---|---|---|
| Latency requirements | domain-* | What's acceptable response time? |
| Throughput needs | domain-* | How many requests per second? |
| Memory constraints | domain-* | What's the memory budget? |
Trace Down ↓
To implementation (Layer 1):
"Need to reduce allocations"
↓ m01-ownership: Use references, avoid clone
↓ m02-resource: Pre-allocate with_capacity
"Need to parallelize"
↓ m07-concurrency: Choose rayon or threads
↓ m07-concurrency: Consider async for I/O-bound
"Need cache efficiency"
↓ Data layout: Prefer Vec over HashMap when possible
↓ Access patterns: Sequential over random access
Quick Reference
| Tool | Purpose |
|---|---|
cargo bench |
Micro-benchmarks |
criterion |
Statistical benchmarks |
perf / flamegraph |
CPU profiling |
heaptrack |
Allocation tracking |
valgrind / cachegrind |
Cache analysis |
Optimization Priority
1. Algorithm choice (10x - 1000x)
2. Data structure (2x - 10x)
3. Allocation reduction (2x - 5x)
4. Cache optimization (1.5x - 3x)
5. SIMD/Parallelism (2x - 8x)
Common Techniques
| Technique | When | How |
|---|---|---|
| Pre-allocation | Known size | Vec::with_capacity(n) |
| Avoid cloning | Hot paths | Use references or Cow<T> |
| Batch operations | Many small ops | Collect then process |
| SmallVec | Usually small | smallvec::SmallVec<[T; N]> |
| Inline buffers | Fixed-size data | Arrays over Vec |
Common Mistakes
| Mistake | Why Wrong | Better |
|---|---|---|
| Optimize without profiling | Wrong target | Profile first |
| Benchmark in debug mode | Meaningless | Always --release |
| Use LinkedList | Cache unfriendly | Vec or VecDeque |
Hidden .clone() |
Unnecessary allocs | Use references |
| Premature optimization | Wasted effort | Make it work first |
Anti-Patterns
| Anti-Pattern | Why Bad | Better |
|---|---|---|
| Clone to avoid lifetimes | Performance cost | Proper ownership |
| Box everything | Indirection cost | Stack when possible |
| HashMap for small sets | Overhead | Vec with linear search |
| String concat in loop | O(n^2) | String::with_capacity or format! |
Related Skills
| When | See |
|---|---|
| Reducing clones | m01-ownership |
| Concurrency options | m07-concurrency |
| Smart pointer choice | m02-resource |
| Domain requirements | domain-* |
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
meta-cognition-parallel
EXPERIMENTAL: Three-layer parallel meta-cognition analysis. Triggers on: /meta-parallel, 三层分析, parallel analysis, 并行元认知
domain-cloud-native
Use when building cloud-native apps. Keywords: kubernetes, k8s, docker, container, grpc, tonic, microservice, service mesh, observability, tracing, metrics, health check, cloud, deployment, 云原生, 微服务, 容器
m07-concurrency
CRITICAL: Use for concurrency/async. Triggers: E0277 Send Sync, cannot be sent between threads, thread, spawn, channel, mpsc, Mutex, RwLock, Atomic, async, await, Future, tokio, deadlock, race condition, 并发, 线程, 异步, 死锁
unsafe-checker
CRITICAL: Use for unsafe Rust code review and FFI. Triggers on: unsafe, raw pointer, FFI, extern, transmute, *mut, *const, union, #[repr(C)], libc, std::ffi, MaybeUninit, NonNull, SAFETY comment, soundness, undefined behavior, UB, safe wrapper, memory layout, bindgen, cbindgen, CString, CStr, 安全抽象, 裸指针, 外部函数接口, 内存布局, 不安全代码, FFI 绑定, 未定义行为
rust-refactor-helper
Safe Rust refactoring with LSP analysis. Triggers on: /refactor, rename symbol, move function, extract, 重构, 重命名, 提取函数, 安全重构
rust-skill-creator
Use when creating skills for Rust crates or std library documentation. Keywords: create rust skill, create crate skill, create std skill, 创建 rust skill, 创建 crate skill, 创建 std skill, 动态 rust skill, 动态 crate skill, skill for tokio, skill for serde, skill for axum, generate rust skill, rust 技能, crate 技能, 从文档创建skill, from docs create skill
Didn't find tool you were looking for?