Agent skill

validation

Full validation phase orchestrator. Vibe + post-mortem + retro + forge. Reviews implementation quality, extracts learnings, feeds the knowledge flywheel. Triggers: "validation", "validate", "validate work", "review and learn", "validation phase", "post-implementation review".

Stars 271
Forks 24

Install this agent skill to your Project

npx add-skill https://github.com/boshu2/agentops/tree/main/skills/validation

Metadata

Additional technical details for this skill

tier
meta
dependencies
[
    "vibe",
    "post-mortem",
    "retro",
    "forge",
    "shared"
]

SKILL.md

/validation — Full Validation Phase Orchestrator

YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.

DAG — Execute This Sequentially

mkdir -p .agents/rpi
detect complexity from execution-packet or --complexity flag (default: standard)
detect ao CLI availability

Step 0: Load Prior Validation Context

Before running the validation pipeline, pull relevant learnings from prior reviews:

bash
if command -v ao &>/dev/null; then
    ao lookup --query "<epic or goal context> validation review patterns" --limit 5 2>/dev/null || true
fi

Apply retrieved knowledge (mandatory when results returned):

If learnings are returned, do NOT just load them as passive context. For each returned item:

  1. Check: does this learning apply to the current validation scope? (answer yes/no)
  2. If yes: include it as a known_risk — what pattern does it warn about? does the code exhibit it?
  3. Cite applicable learnings by filename when they influence a validation finding

After applying, record each citation:

bash
ao metrics cite "<learning-path>" --type applied 2>/dev/null || true

Skip silently if ao is unavailable or returns no results. Run every step in order. Do not stop between steps.

STEP 1  ──  Skill(skill="vibe", args="recent [--quick]")
              Use --quick for fast/standard. Full council for full.
              PASS/WARN? → continue
              FAIL?      → write summary, output <promise>FAIL</promise>, stop
                           (validation cannot fix code — caller decides retry)

STEP 1.5 ── Four-Surface Closure (mandatory)
              Read `skills/validation/references/four-surface-closure.md` for the mandatory four-surface closure check.
              Check all four surfaces: Code, Documentation, Examples, Proof.
              All 4 pass? → continue
              if --strict-surfaces:
                Any surface fails? → FAIL, write summary, output <promise>FAIL</promise>, stop
              else (default):
                Code passes, others fail? → WARN, continue
                Code fails? → BLOCK, write summary, output <promise>FAIL</promise>, stop

STEP 1.6 ── Test pyramid coverage audit (advisory, append to summary)
              Check L0-L3 + BF1/BF4 per modified file. WARN only, not FAIL.

STEP 1.7 ── Lifecycle Checks (advisory except critical dependency findings)
              Skip entire step if: --no-lifecycle flag.
              Each sub-step uses --quick mode to limit context consumption.
              On budget expiry: skip remaining sub-steps, write [TIME-BOXED].

              a) if lifecycle tier >= minimal AND test_framework_detected:
                   Skill(skill="test", args="coverage --quick")
                   Append coverage delta to phase summary.

              b) if lifecycle tier >= standard AND dependency_manifest_exists:
                   Skill(skill="deps", args="vuln --quick")
                   CRITICAL vulns (CVSS >= 9.0): **FAIL** (block shipping). Opt-out: `--allow-critical-deps` for acknowledged risk acceptance.
                   Non-critical: advisory note only.

              c) if lifecycle tier >= standard:
                   Skill(skill="review", args="--diff --quick")
                   Append review findings to summary as advisory.

              d) if lifecycle tier == full AND modified_files_touch_hot_path:
                   Skill(skill="perf", args="profile --quick")
                   Append perf findings to summary as advisory.
                   Hot path detection: modified files match benchmark files
                   or patterns (handler, middleware, router, parser, engine,
                   worker, pool, codec).

STEP 1.8 ── Stage 4: Behavioral Validation (holdout scenarios + agent-built specs)
            Skip if: no .agents/holdout/ AND no .agents/specs/, or --no-behavioral
            Read `references/step-1.8-behavioral-validation.md` for full sub-steps.
            Loads holdout scenarios + agent specs → evaluator council → satisfaction gate.
            Evaluates each scenario and aggregates results into `satisfaction_score`
            (verdict schema field, `skills/council/schemas/verdict.json`: number 0.0-1.0,
            "Probabilistic satisfaction score (0.0 = unsatisfied, 1.0 = fully satisfied)").
            Per-dimension scores populate `satisfaction_breakdown`. The aggregated
            `satisfaction_score` seeds downstream gates and the phase summary.
            PASS/WARN? → continue | FAIL? → <promise>FAIL</promise>, stop

STEP 2  ──  if epic_id:
              Skill(skill="post-mortem", args="<epic-id> [--quick]")
            else:
              Skill(skill="post-mortem", args="recent [--quick]")
              Use --quick for fast/standard. Full council for full.
              PASS/WARN? → continue
              FAIL?      → write summary, output <promise>FAIL</promise>, stop

STEP 3  ──  if not --no-retro:
              Skill(skill="retro")

STEP 4  ──  if not --no-forge AND ao available:
              if [ -n "${CODEX_THREAD_ID:-}" ] || [ "${CODEX_INTERNAL_ORIGINATOR_OVERRIDE:-}" = "Codex Desktop" ]; then
                ao codex stop --auto-extract 2>/dev/null || true
              else
                ao forge transcript --last-session --queue --quiet 2>/dev/null || true
              fi

STEP 5  ──  write phase summary to .agents/rpi/phase-3-summary-YYYY-MM-DD-<slug>.md
              ao ratchet record vibe 2>/dev/null || true
              output <promise>DONE</promise>

That's it. Steps 1→2→3→4→5. No stopping between steps.


Setup Detail

State:

validation_state = {
  epic_id: "<epic-id or null>",
  complexity: <fast|standard|full>,
  no_retro: <true if --no-retro>,
  no_forge: <true if --no-forge>,
  strict_surfaces: <true if --strict-surfaces>,
  vibe_verdict: null,
  post_mortem_verdict: null
}

Load execution packet (if available): read complexity, contract_surfaces, and done_criteria from .agents/rpi/execution-packet.json. When a current run_id is known, prefer the matching .agents/rpi/runs/<run-id>/execution-packet.json archive over the latest alias.

Gate Detail

Validation has multiple blocking conditions. Validation cannot fix code — it can only report and fail closeout when the lifecycle contract is not met.

  • Blocking FAIL conditions: vibe FAIL, code-surface failure in STEP 1.5, --strict-surfaces failure on any closure surface, CVSS >= 9.0 dependency findings in STEP 1.7b unless --allow-critical-deps, and post-mortem FAIL in STEP 2.
  • PASS/WARN: Log verdicts, continue through the remaining steps.
  • FAIL: Extract findings from the latest evaluator output, write phase summary with FAIL status, output <promise>FAIL</promise> with findings attached. Suggest: "Validation FAIL. Fix findings, then re-run /validation [epic-id]".

Why no internal retry: Retries require re-implementation (/crank). The caller (/rpi or human) decides whether to loop back.

Phase Summary Format

Write to .agents/rpi/phase-3-summary-YYYY-MM-DD-<slug>.md:

markdown
# Phase 3 Summary: Validation

- **Epic:** <epic-id or "standalone">
- **Vibe verdict:** <PASS|WARN|FAIL>
- **Post-mortem verdict:** <verdict or "skipped">
- **Retro:** <captured|skipped>
- **Forge:** <mined|skipped>
- **Complexity:** <fast|standard|full>
- **Status:** <DONE|FAIL>
- **Timestamp:** <ISO-8601>

Phase Budgets

Sub-step fast standard full
Vibe 2 min 3 min 5 min
Post-mortem 2 min 3 min 5 min
Retro 1 min 1 min 2 min
Forge skip 2 min 3 min

On budget expiry: allow in-flight calls to complete, write [TIME-BOXED] marker, proceed.

Flags

Flag Default Description
--complexity=<level> auto Force complexity level (fast/standard/full)
--no-lifecycle off Skip ALL lifecycle checks in STEP 1.7 (test, deps, review, perf)
--lifecycle=<tier> matches complexity Controls which lifecycle skills fire: minimal (test only), standard (+deps, +review), full (+perf)
--no-retro off Skip retro step only
--no-forge off Skip forge step only
--no-budget off Disable phase time budgets
--strict-surfaces off Make all 4 surface failures blocking (FAIL instead of WARN). Passed automatically by /rpi --quality.
--allow-critical-deps off Allow shipping with CVSS >= 9.0 vulnerabilities (acknowledged risk acceptance)

Quick Start

bash
/validation ag-5k2                        # validate epic with full close-out
/validation                               # validate recent work (no epic)
/validation --complexity=full ag-5k2      # force full council ceremony
/validation --no-retro ag-5k2             # skip retro only
/validation --no-forge ag-5k2             # skip forge only

Completion Markers

<promise>DONE</promise>    # Validation passed, learnings captured
<promise>FAIL</promise>    # Vibe failed, re-implementation needed (findings attached)

Troubleshooting

Problem Cause Solution
Vibe FAIL on first run Implementation has quality issues Fix findings via /crank, then re-run /validation
Post-mortem reviewed recent work instead of an epic No epic-id provided Pass epic-id for epic-scoped closeout: /validation ag-5k2
Codex closeout missing Codex has no session-end hook surface Let /validation run ao codex stop, or run ao codex stop manually before leaving the session
Forge produces no output No ao CLI or no transcript content Install ao CLI or run /retro manually
Stale execution-packet Packet from a previous RPI cycle Delete .agents/rpi/execution-packet.json and pass --complexity explicitly

Reference Documents

  • references/four-surface-closure.md — four-surface closure validation (code + docs + examples + proof)
  • references/forge-scope.md — forge session scoping and deduplication
  • references/idempotency-and-resume.md — re-run behavior and standalone mode

See Also

Core phases: vibe, post-mortem, retro, forge, crank, discovery, rpi. Lifecycle Step 1.7: test, deps, review, perf.

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results