Agent skills
verification-protocol

Agent skill

verification-protocol

How to prove a hypothesis is TRUE or FALSE using Foundry tests.

View SKILL.md on GitHub Repository

Stars 215

Forks 33

Install this agent skill to your Project

npx add-skill https://github.com/PlamenTSV/plamen/tree/main/agents/skills/evm/verification-protocol

SKILL.md

Verification Protocol

How to prove a hypothesis is TRUE or FALSE using Foundry tests.

Evidence Source Tracking (MANDATORY)

CRITICAL: For EVERY piece of evidence used in verification, you MUST tag its source. Evidence from mocks or unverified external contracts CANNOT support a REFUTED verdict.

Evidence Source Tags

Tag	Meaning	Valid for REFUTED?
[PROD]	Production contract (verified on-chain)	YES
[MOCK]	Mock/test contract	NO
[CODE]	Audited codebase (in-scope)	YES
[EXT-UNV]	External, unverified behavior	NO
[DOC]	Documentation/spec only	NO (needs verification)

Evidence Audit Table (REQUIRED in every verification output)

Before ANY verdict, fill this table:

markdown

### Evidence Audit
| Claim | Evidence Source | Tag | Valid for REFUTED? |
|-------|-----------------|-----|-------------------|
| "External returns X" | Mock contract | [MOCK] | NO |
| "State changes to Y" | Protocol.sol:123 | [CODE] | YES |
| "Transfer triggers Z" | Etherscan source | [PROD] | YES |

Mock Rejection Rule

AUTOMATIC OVERRIDE: If ANY evidence supporting REFUTED has tag [MOCK] or [EXT-UNV]:

CANNOT return REFUTED
MUST return CONTESTED
Triggers production verification (Step 4a.5)

Example:

markdown

## Verdict: REFUTED -> CONTESTED (mock evidence override)

### Evidence Audit
| Claim | Source | Tag | Valid? |
|-------|--------|-----|--------|
| "Staking returns shares" | StakingMock.sol:45 | [MOCK] | NO |

**Override reason**: REFUTED verdict relies on mock behavior at StakingMock.sol:45.
Production contract behavior is UNVERIFIED. Must fetch production source.

Pre-Verification Understanding

Before writing ANY test code, you MUST answer:

Question 1: What is the EXACT bug?

NOT: "Something is inconsistent"
NOT: "State is wrong"
NOT: "Reentrancy possible"

YES: "[Variable] is [read/written] at [location] but should be [read/written]
      at [other location] because [specific reason]"

Question 2: What OBSERVABLE difference proves it?

NOT: "Values are different"
NOT: "State changed"

YES: "Before operation: [variable] = [expected value]
      After operation: [variable] = [actual value]
      Expected: [what it should be]"

Question 3: What is the EXACT assertion?

NOT: assertTrue(bugExists)
NOT: assertFalse(isSecure)

YES: assertEq(actualValue, expectedValue, "description of what's wrong")
 OR: assertNotEq(before, after, "value changed when it shouldn't")
 OR: assertGt(error, threshold, "error exceeds acceptable threshold")

If you cannot answer all three -> ASK FOR CLARIFICATION

Pre-PoC Feasibility Gates (MANDATORY)

Before writing test code, verify these two gates. If either FAILS, adjust the hypothesis.

Gate F1: Reachability

Trace a call path from a permissionless entry point to the vulnerable code.

Entry point identified (public/external/entry function)
Call path traced through intermediary functions
All access checks on the path are passable by the attacker profile

If NO entry point reaches the vulnerable code → UNREACHABLE → FALSE_POSITIVE. If reachable only through a restricted path → document the restriction, adjust likelihood.

Gate F2: Math Bounds

Substitute real-world value domains into the expression that triggers the bug.

Parameter domains identified (token decimals, max supply, TVL range, fee range, time bounds)
Expression evaluated at worst-case feasible inputs
Result crosses the bug threshold

If the bug requires values outside feasible domains → INFEASIBLE → FALSE_POSITIVE. If feasible only at extreme but realistic parameters → document the threshold, proceed with adjusted severity.

Both gates PASS → proceed to PoC. Either gate FAILS → document and stop.

Test File Template

solidity

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;

import "forge-std/Test.sol";
import "forge-std/console.sol";

/**
 * @title Test_H{N}: {Title}
 *
 * BUG: {2 sentence description}
 * EXPECTED: {what should happen}
 * ACTUAL: {what does happen}
 */
contract Test_H{N} is Test {

    // === CONTRACTS ===
    // Declare target contract and any dependencies

    // === ACTORS ===
    address attacker = makeAddr("attacker");
    address victim = makeAddr("victim");
    address owner = makeAddr("owner");

    // === SETUP ===
    function setUp() public {
        // Deploy contracts
        // Set initial state
        // Fund actors if needed
    }

    // === TEST: Direct bug demonstration ===
    function test_H{N}_bug_demonstration() public {
        // 1. RECORD BEFORE
        console.log("=== BEFORE ===");
        uint256 valueBefore = target.criticalValue();
        console.log("Critical value:", valueBefore);

        // 2. ACTION
        console.log("=== ACTION ===");
        // Perform the operation that triggers the bug

        // 3. RECORD AFTER
        console.log("=== AFTER ===");
        uint256 valueAfter = target.criticalValue();
        console.log("Critical value:", valueAfter);

        // 4. PROVE BUG
        console.log("=== VERIFICATION ===");
        // THE ASSERTION THAT PROVES THE BUG
        // Design this so it PASSES when the bug EXISTS
    }

    // === TEST: Impact demonstration (optional) ===
    function test_H{N}_impact() public {
        // Show cumulative impact or attacker profit
    }
}

Interpreting Results

Test PASSES -> Bug CONFIRMED

The assertion that "proves the bug" succeeded.

If assertNotEq(after, before) passes -> values ARE different (bug exists)
If assertGt(error, threshold) passes -> error IS above threshold (bug exists)

Test FAILS -> Check Why

Failure	Meaning	Action
Assertion failed: values equal	Bug doesn't exist as hypothesized	Re-examine hypothesis
Revert in setup	Deployment/config wrong	Fix setup
Revert in action	Operation blocked	Check preconditions
Arithmetic error	Values wrong	Check calculations

Iteration Protocol

Attempt 1: Direct implementation of test strategy from hypothesis

Attempt 2: Adjust parameters

Different amounts (larger/smaller)
Different timing (more/fewer blocks)
Different actors

Attempt 3: Re-examine assumptions

Is setup correct?
Are preconditions met?
Is the bug mechanism correctly understood?

After 5 attempts:

If still fails -> FALSE_POSITIVE with documented reasoning
Explain why the hypothesis was wrong

Severity Determination

CRITICAL

Direct fund theft possible
Protocol insolvency
No special prerequisites needed
Attacker profits significantly

HIGH

Fund loss with some setup
Broken core functionality
Significant value at risk
Cumulative error compounds quickly

MEDIUM

Limited fund loss
Requires specific conditions
Edge cases with real impact
Moderate value at risk

LOW

Negligible direct impact
Extreme edge cases only
Owner/admin controlled risk
Informational with minor consequence

Output Format

CONFIRMED

markdown

## Verdict: CONFIRMED

### Bug Mechanism Verified
{Explain what the test proves in 2-3 sentences}

### Test File
`test/audit/Test_H{N}.t.sol`

### Test Output

{Paste relevant forge test output}


### Key Evidence
| Metric | Value |
|--------|-------|
| Before | {value} |
| After | {value} |
| Expected | {value} |
| Difference | {calculation} |

### Severity: {LEVEL}
{Justification in 1-2 sentences}

FALSE_POSITIVE

markdown

## Verdict: FALSE_POSITIVE

### Attempts Made

**Attempt 1:**
- Approach: {description}
- Result: {what happened}
- Learning: {insight}

**Attempt 2:**
- Approach: {description}
- Result: {what happened}
- Learning: {insight}

**Attempt 3:**
- Approach: {description}
- Result: {what happened}
- Learning: {insight}

### Why It's Not a Bug
{Explain the actual behavior and why hypothesis was wrong in 2-3 sentences}

CONTESTED (NEW in v5 -- CRITICAL)

markdown

## Verdict: CONTESTED

### Evidence Status
| Checkpoint | Status | Details |
|------------|--------|---------|
| External behavior verified against PRODUCTION | NO | Used mock behavior as evidence |
| All callers checked | YES | Checked A, B, C |
| Profit calculated with attacker holding | NO | Only analyzed donation loss |

### Why This Cannot Be REFUTED
{Explain what evidence is missing to definitively rule out the bug}

### Escalation Required
- [ ] Fetch production contract source for {external dep}
- [ ] Re-analyze with attacker holding shares
- [ ] Check additional caller paths: {list}

### Current Assessment
Likely: {TRUE_POSITIVE / FALSE_POSITIVE / UNKNOWN}
Confidence: {LOW / MEDIUM}

Insufficient Evidence (HALT CONDITIONS - CRITICAL)

MANDATORY: You MUST check ALL boxes before returning REFUTED. If ANY checkbox is NO -> Return CONTESTED, not REFUTED.

Before marking REFUTED, check:

External behavior verified against PRODUCTION (not mock)
- Read {scratchpad}/external_production_behavior.md
- If external dep is marked 'UNVERIFIED' -> CANNOT use as evidence
- If mock differs from production -> use PRODUCTION behavior
Attack path checked on ALL callers (not just main path)
- Use mcp__slither-analyzer__get_function_callers() to enumerate
Profit calculated with attacker HOLDING tokens (not just donating)
- "Attacker loses by donating" is NOT sufficient evidence
- Check: what if attacker holds X% of shares BEFORE donating?
Missing precondition documented
- Document in structured format: precondition type + why it blocks
- Types: STATE / ACCESS / TIMING / EXTERNAL / BALANCE
Searched other findings for matching postconditions
- Read {scratchpad}/findings_inventory.md for CONFIRMED/PARTIAL findings
- Check if ANY finding creates the postcondition that would enable this attack
- If match found -> CONTESTED, not REFUTED (chain analysis will combine)

Evidence That Does NOT Count

"Mock shows X" -- mocks != production (CRITICAL: always verify against production)
"Standard ERC20" -- may have transfer hooks, side effects
"Attacker loses by donating" -- may profit via shares held
"Function is internal" -- may be called by public function
"Requires admin" -- admin may be compromised or malicious
"Attacker cannot acquire X" -- another finding may CREATE this condition

Anti-Downgrade Halt for VS/BLIND Findings (HARD RULE)

For findings from Validation Sweep ([VS-]) or Blind Spot Scanner ([BLIND-]): apply Rule 13's 5-question test BEFORE any downgrade. HALT: If test shows users harmed AND unavoidable AND undocumented -> you CANNOT return FALSE_POSITIVE. Minimum verdict: CONTESTED. Defense parity gaps (Contract A has protection X, Contract B lacks it for same action) are NEVER "by design" -> minimum severity: Medium, minimum verdict: CONTESTED. Violating this halt is a workflow error equivalent to using [MOCK] evidence for REFUTED.

Chain Analysis Integration

A finding is NEVER truly REFUTED until chain analysis completes.

If you mark a finding as REFUTED but document a missing precondition, the chain analyzer (Step 6b) will search for other findings whose postconditions match your missing precondition. If found, the finding will be escalated to CONTESTED and combined into a chain hypothesis.

Example:

Your finding: "Donation attack blocked because attacker cannot hold receipt tokens"
Other finding: "External protocol interaction returns transferable receipt tokens"
Chain: Other finding enables your finding -> Combined HIGH severity

Advanced Protocol Reference: See advanced.md for RAG queries before PoC, exchange rate finding severity, design flaw escalation, bidirectional role analysis, RAG confidence override, chain hypothesis protection, fork testing, and Foundry PoC methodology.

Maintainer

PlamenTSV Core maintainer

Source details

Full Name: PlamenTSV/plamen
Branch: main
Path in repo: agents/skills/evm/verification-protocol
License: MIT License
Topics: claude-code ai-agent security-audit solana ethereum solidity aptos smart-contract-audit sui web3-security

Featured Tools

Join Our Newsletter

215 33

Explore

PlamenTSV/plamen

dex-integration-security

215 33

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Verification Protocol

Evidence Source Tracking (MANDATORY)

Evidence Source Tags

Evidence Audit Table (REQUIRED in every verification output)

Mock Rejection Rule

Pre-Verification Understanding

Question 1: What is the EXACT bug?

Question 2: What OBSERVABLE difference proves it?

Question 3: What is the EXACT assertion?

Pre-PoC Feasibility Gates (MANDATORY)

Gate F1: Reachability

Gate F2: Math Bounds

Test File Template

Interpreting Results

Test PASSES -> Bug CONFIRMED

Test FAILS -> Check Why

Iteration Protocol

Severity Determination

CRITICAL

HIGH

MEDIUM

LOW

Output Format

CONFIRMED

FALSE_POSITIVE

CONTESTED (NEW in v5 -- CRITICAL)

Insufficient Evidence (HALT CONDITIONS - CRITICAL)

Evidence That Does NOT Count

Anti-Downgrade Halt for VS/BLIND Findings (HARD RULE)

Chain Analysis Integration

Recommended Agent Skills

integration-hazard-research

outcome-determinism

governance-attack-vectors

vault-accounting

lending-protocol-security

dex-integration-security