Agent skill
m13-domain-error
Use when designing domain error handling. Keywords: domain error, error categorization, recovery strategy, retry, fallback, domain error hierarchy, user-facing vs internal errors, error code design, circuit breaker, graceful degradation, resilience, error context, backoff, retry with backoff, error recovery, transient vs permanent error, 领域错误, 错误分类, 恢复策略, 重试, 熔断器, 优雅降级
Install this agent skill to your Project
npx add-skill https://github.com/actionbook/rust-skills/tree/main/skills/m13-domain-error
SKILL.md
Domain Error Strategy
Layer 2: Design Choices
Core Question
Who needs to handle this error, and how should they recover?
Before designing error types:
- Is this user-facing or internal?
- Is recovery possible?
- What context is needed for debugging?
Error Categorization
| Error Type | Audience | Recovery | Example |
|---|---|---|---|
| User-facing | End users | Guide action | InvalidEmail, NotFound |
| Internal | Developers | Debug info | DatabaseError, ParseError |
| System | Ops/SRE | Monitor/alert | ConnectionTimeout, RateLimited |
| Transient | Automation | Retry | NetworkError, ServiceUnavailable |
| Permanent | Human | Investigate | ConfigInvalid, DataCorrupted |
Thinking Prompt
Before designing error types:
-
Who sees this error?
- End user → friendly message, actionable
- Developer → detailed, debuggable
- Ops → structured, alertable
-
Can we recover?
- Transient → retry with backoff
- Degradable → fallback value
- Permanent → fail fast, alert
-
What context is needed?
- Call chain → anyhow::Context
- Request ID → structured logging
- Input data → error payload
Trace Up ↑
To domain constraints (Layer 3):
"How should I handle payment failures?"
↑ Ask: What are the business rules for retries?
↑ Check: domain-fintech (transaction requirements)
↑ Check: SLA (availability requirements)
| Question | Trace To | Ask |
|---|---|---|
| Retry policy | domain-* | What's acceptable latency for retry? |
| User experience | domain-* | What message should users see? |
| Compliance | domain-* | What must be logged for audit? |
Trace Down ↓
To implementation (Layer 1):
"Need typed errors"
↓ m06-error-handling: thiserror for library
↓ m04-zero-cost: Error enum design
"Need error context"
↓ m06-error-handling: anyhow::Context
↓ Logging: tracing with fields
"Need retry logic"
↓ m07-concurrency: async retry patterns
↓ Crates: tokio-retry, backoff
Quick Reference
| Recovery Pattern | When | Implementation |
|---|---|---|
| Retry | Transient failures | exponential backoff |
| Fallback | Degraded mode | cached/default value |
| Circuit Breaker | Cascading failures | failsafe-rs |
| Timeout | Slow operations | tokio::time::timeout |
| Bulkhead | Isolation | separate thread pools |
Error Hierarchy
#[derive(thiserror::Error, Debug)]
pub enum AppError {
// User-facing
#[error("Invalid input: {0}")]
Validation(String),
// Transient (retryable)
#[error("Service temporarily unavailable")]
ServiceUnavailable(#[source] reqwest::Error),
// Internal (log details, show generic)
#[error("Internal error")]
Internal(#[source] anyhow::Error),
}
impl AppError {
pub fn is_retryable(&self) -> bool {
matches!(self, Self::ServiceUnavailable(_))
}
}
Retry Pattern
use tokio_retry::{Retry, strategy::ExponentialBackoff};
async fn with_retry<F, T, E>(f: F) -> Result<T, E>
where
F: Fn() -> impl Future<Output = Result<T, E>>,
E: std::fmt::Debug,
{
let strategy = ExponentialBackoff::from_millis(100)
.max_delay(Duration::from_secs(10))
.take(5);
Retry::spawn(strategy, || f()).await
}
Common Mistakes
| Mistake | Why Wrong | Better |
|---|---|---|
| Same error for all | No actionability | Categorize by audience |
| Retry everything | Wasted resources | Only transient errors |
| Infinite retry | DoS self | Max attempts + backoff |
| Expose internal errors | Security risk | User-friendly messages |
| No context | Hard to debug | .context() everywhere |
Anti-Patterns
| Anti-Pattern | Why Bad | Better |
|---|---|---|
| String errors | No structure | thiserror types |
| panic! for recoverable | Bad UX | Result with context |
| Ignore errors | Silent failures | Log or propagate |
| Box<dyn Error> everywhere | Lost type info | thiserror |
| Error in happy path | Performance | Early validation |
Related Skills
| When | See |
|---|---|
| Error handling basics | m06-error-handling |
| Retry implementation | m07-concurrency |
| Domain modeling | m09-domain |
| User-facing APIs | domain-* |
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
meta-cognition-parallel
EXPERIMENTAL: Three-layer parallel meta-cognition analysis. Triggers on: /meta-parallel, 三层分析, parallel analysis, 并行元认知
domain-cloud-native
Use when building cloud-native apps. Keywords: kubernetes, k8s, docker, container, grpc, tonic, microservice, service mesh, observability, tracing, metrics, health check, cloud, deployment, 云原生, 微服务, 容器
m07-concurrency
CRITICAL: Use for concurrency/async. Triggers: E0277 Send Sync, cannot be sent between threads, thread, spawn, channel, mpsc, Mutex, RwLock, Atomic, async, await, Future, tokio, deadlock, race condition, 并发, 线程, 异步, 死锁
unsafe-checker
CRITICAL: Use for unsafe Rust code review and FFI. Triggers on: unsafe, raw pointer, FFI, extern, transmute, *mut, *const, union, #[repr(C)], libc, std::ffi, MaybeUninit, NonNull, SAFETY comment, soundness, undefined behavior, UB, safe wrapper, memory layout, bindgen, cbindgen, CString, CStr, 安全抽象, 裸指针, 外部函数接口, 内存布局, 不安全代码, FFI 绑定, 未定义行为
rust-refactor-helper
Safe Rust refactoring with LSP analysis. Triggers on: /refactor, rename symbol, move function, extract, 重构, 重命名, 提取函数, 安全重构
rust-skill-creator
Use when creating skills for Rust crates or std library documentation. Keywords: create rust skill, create crate skill, create std skill, 创建 rust skill, 创建 crate skill, 创建 std skill, 动态 rust skill, 动态 crate skill, skill for tokio, skill for serde, skill for axum, generate rust skill, rust 技能, crate 技能, 从文档创建skill, from docs create skill
Didn't find tool you were looking for?