Agent skill

ironbee-verify

Trigger browser verification of code changes. Args: (default), full, visual, functional

Stars 136
Forks 3

Install this agent skill to your Project

npx add-skill https://github.com/ironbee-ai/ironbee-cli/tree/main/src/clients/cursor/commands/ironbee-verify

SKILL.md

IronBee Verify

Verify the current code changes in the browser.

Usage

  • /ironbee-verifydefault — focus on what changed, visual + functional checks on affected areas
  • /ironbee-verify fullfull scope — entire application, all checklists, edge cases, responsive, accessibility deep dive
  • /ironbee-verify visualvisual only — contrast, layout, spacing, fonts, images, theming
  • /ironbee-verify functionalfunctional only — clicks, forms, navigation, data flow, error handling

If no argument is given, use default mode.


Steps (all modes)

  1. Start verification: Run echo '{"session_id":"<your-session-id>"}' | ironbee hook verification-start via terminal
  2. Build and start the application if not already running
  3. For EVERY page you visit, repeat this cycle: a. Navigate using browser-devtools MCP tools b. Take a FULL PAGE screenshot with fullPage: true c. Take an ARIA snapshot to capture the page structure d. STOP and visually analyze the screenshot — switch your focus entirely to finding visual problems. Look at this screenshot as if your ONLY job is to find visual defects: WARNING: ARIA reports DOM content, not what the user actually sees. Do NOT assume the page looks correct just because ARIA shows the right content. Only the screenshot tells you what the user actually sees.
    • Text readability — is it readable against its background? Look for text that blends in or poor contrast
    • Layout — overlapping elements, unexpected gaps, overflow, content cut off
    • Spacing — consistent padding/margin? Too cramped or too far apart?
    • Colors — intentional and consistent? Any jarring mismatches?
    • Typography — right sizes? Clipped or truncated text?
    • Images/icons — loaded? Right size and aspect ratio?
    • States — empty, loading, disabled, error states rendered properly? Report your visual findings before continuing. e. Read the ARIA snapshot — verify headings, labels, landmarks, and structure f. If anything looks wrong → note it as an issue
  4. Functionally test — run the checklist for your mode (see below). After each significant interaction, take another screenshot and repeat the visual analysis.
  5. Check console for errors
  6. Stop the dev server when verification is complete
  7. Submit your verdict via terminal:
    • Pass: echo '{"session_id":"...","status":"pass","pages_tested":[...],"checks":[...],"console_errors":0,"network_failures":0}' | ironbee hook submit-verdict
    • Fail: echo '{"session_id":"...","status":"fail","pages_tested":[...],"checks":[...],"console_errors":N,"network_failures":N,"issues":["describe what failed"]}' | ironbee hook submit-verdict
  8. If failed → collect ALL issues first (finish testing all affected pages), submit one fail verdict with all issues, then fix everything, rebuild, and re-verify. Do not fix one issue at a time — batch fixes to avoid repeated build/restart cycles.
  9. If pass after a previous fail, include "fixes" in the verdict describing what was fixed

Default Mode

Focus on the code you changed — not the entire application.

1. Study the changes

  1. Run git diff --name-only and git diff --name-only HEAD~1
  2. Ignore .ironbee/, .claude/, .cursor/ — tool config, not application code
  3. Read the full diff (git diff and/or git diff HEAD~1) — understand every change: what was added, removed, modified. Note specific values (colors, sizes, conditions, logic, API endpoints, component props).
  4. Before opening the browser, you should be able to answer: what exactly changed, what should look or behave differently, and what could go wrong?

2. Verify in the browser

  • Cross-reference the diff against what you see. For each change in the diff, verify it is correctly reflected in the browser. If the diff changes a color → check that color. If it changes a calculation → verify the result. If it adds a component → confirm it renders.
  • Test the flow end-to-end — navigate, click, fill forms, submit, verify the outcome
  • Check one edge case — empty input, invalid data, or double-click
  • Console — any new errors or warnings?

Full Mode (/ironbee-verify full)

Comprehensive verification of the entire application. Do NOT run git diff or scope to recent changes. Test every page, every flow, every visual detail. Any issue is a failure, regardless of when it was introduced.

Visual Checklist

In addition to the per-page visual analysis in step 3d:

  • Responsiveness — does the layout adapt if viewport changes? No horizontal scrolling on standard widths
  • Borders & separators — visible and consistent? Not too faint or missing
  • Scroll behavior — does the page scroll smoothly? No content hidden behind sticky headers/footers?

Functional Checklist

  • Navigation — do links and buttons navigate to the correct pages?
  • Forms — fill inputs with real data, select options, submit. Do validation messages appear correctly?
  • Buttons & interactions — do click handlers fire? Do toggles, dropdowns, and modals work?
  • Data flow — does submitted data appear where expected?
  • Error handling — what happens with invalid input? Does the UI handle errors gracefully?
  • Authentication — if applicable, do protected routes redirect correctly?
  • API calls — do network requests succeed? Check for failed requests in console/network
  • State persistence — does state survive page refresh where expected?
  • Edge cases — empty inputs, very long text, special characters, rapid clicks

Accessibility (deep dive)

  • Are headings hierarchical? Do form inputs have labels? Are landmarks present?
  • Check for missing alt text on images

Visual Mode (/ironbee-verify visual)

Focus exclusively on visual quality. Run the per-page visual analysis from step 3d on every page, plus:

  • Responsiveness — viewport changes, no horizontal scrolling
  • Borders & separators — visible and consistent?
  • Scroll behavior — smooth scrolling, no hidden content

Take screenshots of multiple states if applicable (hover, active, disabled, empty, populated).


Functional Mode (/ironbee-verify functional)

Focus exclusively on behavior. Use the same functional checklist as Full Mode above.

Test the complete user flow, not just the single step you changed.


When to FAIL

If you observe ANY problem — wrong data, unexpected errors, visual defects, broken interactions, console errors, data inconsistency between pages — you MUST submit a fail verdict.

Do NOT rationalize away problems. If something looks wrong or behaves unexpectedly, it IS wrong. In full mode, there is no such thing as "pre-existing" — if it's broken, fail it.

After a fail verdict, you MUST fix the issues and re-verify. Do not just report and stop.

Verdict Quality

Your checks array must list specific observations, not generic statements:

  • GOOD: ["login form renders with email and password fields", "submitted valid credentials, redirected to /dashboard", "console clean — 0 errors"]
  • BAD: ["it works", "looks good", "feature implemented"]

Important

  • ALWAYS submit a verdict after every verification attempt — both pass AND fail
  • Do NOT edit code before submitting a fail verdict
  • Noticing a bug and submitting pass is the #1 violation — if you see it, fail it

Expand your agent's capabilities with these related and highly-rated skills.

ironbee-ai/ironbee-cli

ironbee-analyze

Run IronBee session analysis with semantic interpretation of verification metrics, issues, and fixes

136 3
Explore
davila7/claude-code-templates

verl-rl-training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

23,776 2,298
Explore
davila7/claude-code-templates

openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

23,776 2,298
Explore
davila7/claude-code-templates

gguf-quantization

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

23,776 2,298
Explore
davila7/claude-code-templates

Claude Code Guide

Master guide for using Claude Code effectively. Includes configuration templates, prompting strategies "Thinking" keywords, debugging techniques, and best practices for interacting with the agent.

23,776 2,298
Explore
davila7/claude-code-templates

qdrant-vector-search

High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance.

23,776 2,298
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results