Agent skills
nw-formal-verification-tlaplus

Agent skill

nw-formal-verification-tlaplus

TLA+ and PlusCal for specifying distributed system invariants. Decision heuristics for when formal verification adds value, key patterns, state explosion management, and alternatives comparison.

View SKILL.md on GitHub Repository

Stars 341

Forks 40

Install this agent skill to your Project

npx add-skill https://github.com/nWave-ai/nWave/tree/main/nWave/skills/nw-formal-verification-tlaplus

SKILL.md

Formal Verification with TLA+

When to Recommend Formal Verification

Decision Tree

Is the system distributed or concurrent?
|
+-- No --> Complex state machine with high failure cost?
|          +-- No --> NOT cost-effective. Use property-based testing.
|          +-- Yes --> CONSIDER TLA+
|
+-- Yes --> Consensus, coordination, or distributed transactions?
|           +-- Yes --> RECOMMEND TLA+
|           +-- No --> Could concurrency bug cause data loss or safety issues?
|                      +-- Yes --> RECOMMEND TLA+
|                      +-- No --> OFFER as option

Strong Indicators (Recommend)

Domain	Why TLA+ Adds Value	Evidence
Distributed consensus (Paxos, Raft)	Subtle interleaving bugs in leader election	Raft TLA+ spec ~400 lines, found implementation bugs
Financial distributed transactions	Atomicity violations cause monetary loss	AWS DynamoDB replication verified
Leader election, distributed locking	Split-brain, deadlock, stale-lock	AWS lock manager verified
Eventual consistency / CRDTs	Convergence proofs required	TLA+ CRDT framework verifies SEC
Safety-critical state machines	Regulatory requirements	DO-178C, CENELEC recognize formal methods
Multi-party coordination (sagas, 2PC)	Compensation ordering, partial failure	2PC is canonical TLA+ example
Data replication protocols	Ordering, consistency under failure	Elasticsearch, MongoDB, Cosmos DB verified

When NOT to Use

Simple CRUD (bugs are in implementation, not design)
Single-process without complex state machines
Prototypes/MVPs (design will change before verification completes)
Performance optimization (TLA+ models correctness, not performance)

Cost-Benefit Reference

Learning curve: 2-3 weeks to useful results (AWS engineers, all levels)
Typical spec effort: 2-4 weeks part-time for a distributed protocol
ROI highest when: bug cost is high, system is long-lived, protocol is novel, concurrency testing is impractical

Core Concepts for Architects

What TLA+ Specifies

TLA+ describes what a system should do (allowed behaviors), not how to implement it. Specifications are mathematical objects checked for correctness before any code exists.

Safety vs. Liveness

Property Type	Meaning	Expression	Example
Safety	Nothing bad happens	Invariant: predicate true in every reachable state	"Two processes never hold same lock"
Liveness	Something good eventually happens	Temporal: `<>` (eventually), `[]<>` (infinitely often)	"Every request eventually gets response"

Safety violations produce counterexample traces (the debugging artifact). Liveness requires fairness conditions.

PlusCal vs. Raw TLA+

PlusCal compiles to TLA+ with programming-like syntax. Start with PlusCal for first 2-3 specs, then learn raw TLA+ for cases PlusCal cannot express.

Labels define concurrency granularity: everything between two labels is one atomic step. Two processes interleave only at label boundaries.

State Explosion Management

State space grows exponentially: (states per node)^(nodes) x (message permutations).

Containment Strategies

Strategy	Technique	Impact
Bound parameters	Start with 2-3 nodes, 2-4 messages	Most bugs appear at small N
Symmetry reduction	`SYMMETRY Permutations(Nodes)`	Up to N! reduction
Reduce labels	Merge labels where fine-grained atomicity unnecessary	Orders of magnitude
State constraints	`CONSTRAINT Len(log[n]) < MaxLogLength`	Prune uninteresting states
Abstraction	Model protocol not implementation (TCP -> message set)	Dramatic reduction
Decomposition	Multiple focused specs, not one monolith	Each independently checkable
Progressive refinement	2 nodes -> 3 nodes -> add failures -> add liveness	Incremental verification
Simulation mode	`java -jar tla2tools.jar -simulate -depth 100`	Trades completeness for speed

Memory and Time Budgets

Unique States	Expected Time	Memory	Approach
< 10K	Seconds	< 1 GB	Exhaustive, single thread
10K - 1M	Minutes	1-4 GB	Exhaustive, `-workers auto`
1M - 100M	Hours	4-32 GB	Exhaustive with constraints
100M - 1B	Days	32-64 GB	Large instance or simulation
> 1B	Weeks	60+ GB	Simulation, TLAPS, or decompose

Estimation Before Running

Count distinct variable values in model
Multiply domains together for baseline
Start TLC with smallest parameters, observe state count
Extrapolate: doubling a parameter typically squares or cubes the space

Key Specification Patterns

Two-Phase Commit (2PC)

Variables: rmState, tmState, tmPrepared, msgs
Safety: no RM commits while another aborts (Consistency)
State space: 3 RMs ~718 states, 5 RMs ~21,488 states
Common mistake: not modeling RM spontaneous abort or unreliable network

Distributed Consensus (Raft)

Variables: currentTerm, votedFor, log, state, votesGranted, msgs
Safety: at most one leader per term (ElectionSafety)
Safety: logs with same index+term are identical (LogMatching)
State space: 3 nodes, MaxTerm=2 ~10K-100K states

Saga (Compensating Transactions)

Variables: stepState, sagaState, compensateIdx
Safety: steps execute in order, compensations in reverse (OrderInvariant)
Safety: no completed steps remain after abort (CompensationComplete)
Common mistake: not enforcing reverse compensation order

Distributed Lock with Lease

Variables: lockHolder, leaseExpiry, clock, nodeState
Safety: at most one holder (MutualExclusion)
Models crash (node loses awareness) and lease expiry
Common mistake: not distinguishing server-side lock state from node belief

CRDT Convergence (G-Counter)

Variables: counters (vector per node)
Safety: counters monotonically non-decreasing
Liveness: all nodes eventually converge after merge
Common mistake: merge not commutative, associative, and idempotent

Alternatives Comparison

Tool	Best For	Learning	Distributed Systems	Temporal Properties
TLA+/PlusCal	Distributed protocols, consensus	2-3 weeks	Excellent	Native
Alloy	Data models, structural properties	1-2 weeks	Adequate	Limited (Alloy 6)
Property-Based Testing	Implementation correctness	Hours-days	With stateful testing	None
XState/Statecharts	UI workflows, single-process	Days	Not applicable	None
Session Types/Scribble	Communication patterns	Moderate	Good (message patterns)	Implicit
TLAPS (proofs)	Critical certification	Months	Excellent	Full

Combined Workflow (TLA+ + PBT)

Write TLA+ spec during DESIGN wave; identify invariants
Model-check with TLC to verify design
Implement code during DELIVER wave
Reuse TLA+ invariants as PBT properties
PBT verifies implementation conforms to verified design

Architecture Decision Record Template

markdown

## Decision: Use TLA+ for [Component Name]

### Context
[Component] implements [protocol] with [N] participants
and [concurrency/distribution characteristics].

### Problem
Informal reasoning about [failure/interleaving scenario]
is insufficient because [reason].

### Decision
Formally specify [component] in TLA+/PlusCal and verify:
- Safety: [specific invariants]
- Liveness: [specific temporal properties]

### Model Parameters
- Nodes: [2-3 for initial verification]
- Messages: [bounded to N]
- Failure modes: [crash, partition, message loss]

### Estimated Effort
- Specification: [1-2 weeks]
- Model checking: [hours to days]

### Not Modeling (out of scope)
- [performance, serialization, UI, etc.]

Architect's Checklist

Identify components with concurrency, distribution, or complex state machines
Determine safety properties (what must NEVER happen)
Determine liveness properties (what must EVENTUALLY happen)
Estimate model parameters and state space
Assess cost-effectiveness vs. alternatives (decision tree above)
Document decision in ADR with specific invariants and properties
Scope verification: focused specs per subsystem, not one monolith

Industry Precedent

Organization	Systems Verified	Outcome
AWS	14 projects across 10 systems (DynamoDB, S3, EBS)	Found subtle bugs in every system; management actively encourages adoption
Azure Cosmos DB	All 5 consistency levels	Specs became authoritative reference, replaced ambiguous docs
MongoDB	Replication, reconfiguration, transactions	Logless reconfig deployed since 4.4, no protocol bugs
Elasticsearch	Cluster coordination, data replication	4 TLA+ specs + Isabelle proofs, open-sourced
CockroachDB	Transaction layer	TLA+ specs in repository under docs/tla-plus/

Maintainer

nWave-ai Core maintainer

Source details

Full Name: nWave-ai/nWave
Branch: main
Path in repo: nWave/skills/nw-formal-verification-tlaplus
License: MIT License
Topics: ai claude-code claude-code-skills agentic-coding agentic-workflow opencode agentic-ai agentic-framework devops tdd software-architecture bdd claude-code-cli claude-code-hooks claude-code-subagents claude-code-commands atdd lean-ux software-craftmanship

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

nWave-ai/nWave

nw-research

Gathers knowledge from web and files, cross-references across multiple sources, and produces cited research documents. Use when investigating technologies, patterns, or decisions that need evidence backing.

341 40

Explore

nWave-ai/nWave

nw-distill

Acceptance test creation methodology for the DISTILL wave. Domain knowledge for the acceptance designer agent: port-to-port principle, prior wave reading, wave-decision reconciliation, graceful degradation, and document back-propagation.

341 40

Explore

nWave-ai/nWave

nw-review-output-format

YAML output format and approval criteria for platform design reviews. Load when generating review feedback.

341 40

Explore

nWave-ai/nWave

nw-ddd-tactical

Tactical DDD — aggregate design rules, entities, value objects, domain events, repositories, domain services, and anti-pattern detection

341 40

Explore

nWave-ai/nWave

nw-infrastructure-and-observability

Infrastructure as Code patterns (Terraform, Kubernetes), observability design (SLOs, metrics, alerting, dashboards), and pipeline security stages. Load when designing infrastructure, observability, or security scanning.

341 40

Explore

nWave-ai/nWave

nw-par-critique-dimensions

Platform design review critique dimensions and severity levels. Load when reviewing CI/CD pipelines, infrastructure, deployment strategies, observability, or security designs.

341 40

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Formal Verification with TLA+

When to Recommend Formal Verification

Decision Tree

Strong Indicators (Recommend)

When NOT to Use

Cost-Benefit Reference

Core Concepts for Architects

What TLA+ Specifies

Safety vs. Liveness

PlusCal vs. Raw TLA+

State Explosion Management

Containment Strategies

Memory and Time Budgets

Estimation Before Running

Key Specification Patterns

Two-Phase Commit (2PC)

Distributed Consensus (Raft)

Saga (Compensating Transactions)

Distributed Lock with Lease

CRDT Convergence (G-Counter)

Alternatives Comparison

Combined Workflow (TLA+ + PBT)

Architecture Decision Record Template

Architect's Checklist

Industry Precedent

Recommended Agent Skills

nw-research

nw-distill

nw-review-output-format

nw-ddd-tactical

nw-infrastructure-and-observability

nw-par-critique-dimensions