Agent skill
domain-ml
Use when building ML/AI apps in Rust. Keywords: machine learning, ML, AI, tensor, model, inference, neural network, deep learning, training, prediction, ndarray, tch-rs, burn, candle, 机器学习, 人工智能, 模型推理
Install this agent skill to your Project
npx add-skill https://github.com/actionbook/rust-skills/tree/main/skills/domain-ml
SKILL.md
Machine Learning Domain
Layer 3: Domain Constraints
Domain Constraints → Design Implications
| Domain Rule | Design Constraint | Rust Implication |
|---|---|---|
| Large data | Efficient memory | Zero-copy, streaming |
| GPU acceleration | CUDA/Metal support | candle, tch-rs |
| Model portability | Standard formats | ONNX |
| Batch processing | Throughput over latency | Batched inference |
| Numerical precision | Float handling | ndarray, careful f32/f64 |
| Reproducibility | Deterministic | Seeded random, versioning |
Critical Constraints
Memory Efficiency
RULE: Avoid copying large tensors
WHY: Memory bandwidth is bottleneck
RUST: References, views, in-place ops
GPU Utilization
RULE: Batch operations for GPU efficiency
WHY: GPU overhead per kernel launch
RUST: Batch sizes, async data loading
Model Portability
RULE: Use standard model formats
WHY: Train in Python, deploy in Rust
RUST: ONNX via tract or candle
Trace Down ↓
From constraints to design (Layer 2):
"Need efficient data pipelines"
↓ m10-performance: Streaming, batching
↓ polars: Lazy evaluation
"Need GPU inference"
↓ m07-concurrency: Async data loading
↓ candle/tch-rs: CUDA backend
"Need model loading"
↓ m12-lifecycle: Lazy init, caching
↓ tract: ONNX runtime
Use Case → Framework
| Use Case | Recommended | Why |
|---|---|---|
| Inference only | tract (ONNX) | Lightweight, portable |
| Training + inference | candle, burn | Pure Rust, GPU |
| PyTorch models | tch-rs | Direct bindings |
| Data pipelines | polars | Fast, lazy eval |
Key Crates
| Purpose | Crate |
|---|---|
| Tensors | ndarray |
| ONNX inference | tract |
| ML framework | candle, burn |
| PyTorch bindings | tch-rs |
| Data processing | polars |
| Embeddings | fastembed |
Design Patterns
| Pattern | Purpose | Implementation |
|---|---|---|
| Model loading | Once, reuse | OnceLock<Model> |
| Batching | Throughput | Collect then process |
| Streaming | Large data | Iterator-based |
| GPU async | Parallelism | Data loading parallel to compute |
Code Pattern: Inference Server
use std::sync::OnceLock;
use tract_onnx::prelude::*;
static MODEL: OnceLock<SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>> = OnceLock::new();
fn get_model() -> &'static SimplePlan<...> {
MODEL.get_or_init(|| {
tract_onnx::onnx()
.model_for_path("model.onnx")
.unwrap()
.into_optimized()
.unwrap()
.into_runnable()
.unwrap()
})
}
async fn predict(input: Vec<f32>) -> anyhow::Result<Vec<f32>> {
let model = get_model();
let input = tract_ndarray::arr1(&input).into_shape((1, input.len()))?;
let result = model.run(tvec!(input.into()))?;
Ok(result[0].to_array_view::<f32>()?.iter().copied().collect())
}
Code Pattern: Batched Inference
async fn batch_predict(inputs: Vec<Vec<f32>>, batch_size: usize) -> Vec<Vec<f32>> {
let mut results = Vec::with_capacity(inputs.len());
for batch in inputs.chunks(batch_size) {
// Stack inputs into batch tensor
let batch_tensor = stack_inputs(batch);
// Run inference on batch
let batch_output = model.run(batch_tensor).await;
// Unstack results
results.extend(unstack_outputs(batch_output));
}
results
}
Common Mistakes
| Mistake | Domain Violation | Fix |
|---|---|---|
| Clone tensors | Memory waste | Use views |
| Single inference | GPU underutilized | Batch processing |
| Load model per request | Slow | Singleton pattern |
| Sync data loading | GPU idle | Async pipeline |
Trace to Layer 1
| Constraint | Layer 2 Pattern | Layer 1 Implementation |
|---|---|---|
| Memory efficiency | Zero-copy | ndarray views |
| Model singleton | Lazy init | OnceLock<Model> |
| Batch processing | Chunked iteration | chunks() + parallel |
| GPU async | Concurrent loading | tokio::spawn + GPU |
Related Skills
| When | See |
|---|---|
| Performance | m10-performance |
| Lazy initialization | m12-lifecycle |
| Async patterns | m07-concurrency |
| Memory efficiency | m01-ownership |
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
meta-cognition-parallel
EXPERIMENTAL: Three-layer parallel meta-cognition analysis. Triggers on: /meta-parallel, 三层分析, parallel analysis, 并行元认知
domain-cloud-native
Use when building cloud-native apps. Keywords: kubernetes, k8s, docker, container, grpc, tonic, microservice, service mesh, observability, tracing, metrics, health check, cloud, deployment, 云原生, 微服务, 容器
m07-concurrency
CRITICAL: Use for concurrency/async. Triggers: E0277 Send Sync, cannot be sent between threads, thread, spawn, channel, mpsc, Mutex, RwLock, Atomic, async, await, Future, tokio, deadlock, race condition, 并发, 线程, 异步, 死锁
unsafe-checker
CRITICAL: Use for unsafe Rust code review and FFI. Triggers on: unsafe, raw pointer, FFI, extern, transmute, *mut, *const, union, #[repr(C)], libc, std::ffi, MaybeUninit, NonNull, SAFETY comment, soundness, undefined behavior, UB, safe wrapper, memory layout, bindgen, cbindgen, CString, CStr, 安全抽象, 裸指针, 外部函数接口, 内存布局, 不安全代码, FFI 绑定, 未定义行为
rust-refactor-helper
Safe Rust refactoring with LSP analysis. Triggers on: /refactor, rename symbol, move function, extract, 重构, 重命名, 提取函数, 安全重构
rust-skill-creator
Use when creating skills for Rust crates or std library documentation. Keywords: create rust skill, create crate skill, create std skill, 创建 rust skill, 创建 crate skill, 创建 std skill, 动态 rust skill, 动态 crate skill, skill for tokio, skill for serde, skill for axum, generate rust skill, rust 技能, crate 技能, 从文档创建skill, from docs create skill
Didn't find tool you were looking for?