Agent skill

visual-verdict

Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.

View SKILL.md on GitHub Repository

Stars 458

Forks 38

Install this agent skill to your Project

npx add-skill https://github.com/vibeeval/vibecosystem/tree/main/skills/visual-verdict

SKILL.md

Visual Verdict — Screenshot QA Skill

Pixel-imprecise human eyes miss visual bugs. This skill provides structured, scored visual review using screenshots, returning actionable verdicts rather than vague impressions.

When to Activate

After a frontend-dev agent implements a UI component or page
When cloning an external website (pairs with clone-website skill)
After CSS changes to verify no visual regressions
When a designer provides a Figma export or reference screenshot
Before marking any UI task as complete

How It Works

1. Take screenshot of current implementation
2. Load reference (design mockup, Figma export, or previous version)
3. Score against 6 dimensions (0-100 each)
4. Compute weighted total score
5. Issue verdict: PASS (90+) / REVISE (60-89) / FAIL (<60)
6. List concrete mismatches with file + line fix hints
7. Loop: frontend-dev fixes → new screenshot → rescore → repeat until PASS

Prerequisites

browser-use MCP installed and active (~/.mcp.json)
Reference image available (Figma PNG export, screenshot, or URL)
Implementation running (local dev server or deployed URL)

Scoring Dimensions

Dimension Weights

Dimension	Weight	Description
Layout accuracy	0.25	Positioning, spacing, alignment of all elements
Typography	0.15	Font family, size, weight, line-height, letter-spacing
Color accuracy	0.15	Background, text, border, shadow, gradient colors
Responsive behavior	0.15	Breakpoint transitions, mobile/tablet rendering
Interactive states	0.15	Hover, focus, active, disabled, loading states
Content completeness	0.15	All text, images, icons, labels present

Weighted Score Formula

total = (layout * 0.25) + (typography * 0.15) + (color * 0.15)
      + (responsive * 0.15) + (interactive * 0.15) + (content * 0.15)

Scoring Each Dimension (0-100)

Layout accuracy

All elements exactly in position, spacing matches: 90-100
Minor spacing deviation (< 4px): 75-89
Noticeable misalignment (4-16px): 50-74
Layout clearly wrong (columns swapped, elements missing): 0-49

Typography

Font, size, weight, line-height all match: 90-100
One attribute off (wrong weight only): 70-89
Two attributes off: 50-69
Wrong font family entirely: 0-49

Color accuracy

All colors within 5% of reference: 90-100
One element color clearly off: 65-89
Multiple elements wrong color: 40-64
Dark/light mode inverted: 0-39

Responsive behavior

Tested at 3 breakpoints, all correct: 90-100
One breakpoint breaks layout: 65-89
Two breakpoints break: 40-64
Not responsive at all: 0-39

Interactive states

All states implemented and visible: 90-100
Missing one minor state (loading): 70-89
Missing hover or focus: 50-69
No interactive states at all: 0-49

Content completeness

All text, images, icons present: 90-100
One non-critical item missing: 75-89
One critical item missing (CTA, heading): 50-74
Multiple items missing: 0-49

Verdict Thresholds

Verdict	Score	Meaning
PASS	90-100	Acceptable for production
REVISE	60-89	Needs fixes, not shippable
FAIL	0-59	Major rework required

Verdict Output Format

markdown

## Visual Verdict: [Component/Page Name]

**Score: 85/100 — REVISE**
**Attempt: 2/5**
**Previous score: 72 (+13 this iteration)**

### Dimension Scores
| Dimension | Score | Weight | Weighted | Issues |
|-----------|-------|--------|----------|--------|
| Layout | 92 | 0.25 | 23.0 | Minor padding on mobile |
| Typography | 68 | 0.15 | 10.2 | Wrong font-weight on H2 |
| Color | 96 | 0.15 | 14.4 | - |
| Responsive | 80 | 0.15 | 12.0 | Card grid breaks at 768px |
| Interactive | 82 | 0.15 | 12.3 | Missing hover on CTA |
| Content | 91 | 0.15 | 13.7 | - |
| **TOTAL** | | | **85.6** | |

### Concrete Mismatches (prioritized by impact)

1. **[Typography / HIGH]** H2 hero heading
   Expected: `font-weight: 700`
   Actual: `font-weight: 400` (normal weight)
   Fix: `src/components/Hero.tsx` → add `font-bold` class to `<h2>`

2. **[Responsive / MEDIUM]** Card grid layout
   Expected: 3-column grid collapses at 1024px
   Actual: Collapses at 768px — causes overflow between 768-1024
   Fix: `src/components/CardGrid.tsx` → change `md:grid-cols-3` to `lg:grid-cols-3`

3. **[Interactive / MEDIUM]** Primary CTA button
   Expected: Darker background on hover (`bg-blue-700`)
   Actual: No hover state change
   Fix: `src/components/Hero.tsx` → add `hover:bg-blue-700 transition-colors` to CTA

4. **[Layout / LOW]** Mobile padding
   Expected: 16px horizontal padding on mobile
   Actual: 12px
   Fix: `src/components/Layout.tsx` → change `px-3` to `px-4`

### Next Steps
Fix issues 1-3 above, then call visual-verdict again.
Estimated score after fixes: ~94 (PASS)

Screenshot Workflow with browser-use MCP

Taking the Current Implementation Screenshot

Use browser-use MCP:
1. Navigate to implementation URL (e.g., http://localhost:3000/dashboard)
2. Wait for page fully loaded (no spinners)
3. Take full-page screenshot
4. Save to /tmp/verdict-current-{timestamp}.png

Loading the Reference

Three reference types accepted:

Figma PNG export: Designer exports component/page as PNG at 2x

Reference path: ~/Desktop/designs/dashboard-v2.png

Previous production screenshot: Regression check

Reference path: ~/.claude/benchmarks/screenshots/dashboard-prod.png

Live URL comparison: Compare two running versions

Reference URL A: http://localhost:3000/dashboard (current)
Reference URL B: https://staging.app.com/dashboard (baseline)

Responsive Testing Breakpoints

Always test at these three widths unless told otherwise:

Name	Width	Target
Mobile	375px	iPhone SE
Tablet	768px	iPad portrait
Desktop	1440px	Standard laptop

Integration with frontend-dev Agent

The typical iteration loop:

frontend-dev implements component
  → visual-verdict takes screenshot + scores
  → REVISE: sends concrete fix list to frontend-dev
  → frontend-dev applies fixes
  → visual-verdict rescores
  → Repeat until PASS (max 5 iterations)
  → PASS: task marked complete

Iteration Limit

Maximum 5 iterations per component. If PASS is not reached after 5 attempts:

ESCALATION: Visual QA failed after 5 iterations
Component: [name]
Best score achieved: [score]
Blocker: [specific issue that keeps failing]
Recommendation: [designer clarification needed / fundamentally wrong approach]

Integration with clone-website Skill

When cloning an external site, visual-verdict is the quality gate:

1. clone-website skill produces initial implementation
2. visual-verdict screenshots both original and clone
3. Scores similarity across all 6 dimensions
4. REVISE verdict triggers targeted fixes
5. PASS verdict confirms clone is production-ready

Color Comparison Method

Color is compared by extracting computed CSS values and comparing:

Reference color:      #1D4ED8  (rgb 29, 78, 216)
Actual color:         #2563EB  (rgb 37, 99, 235)
Delta E (perceived):  6.2 — noticeable, score -15

Delta E thresholds:

Delta E < 2: Imperceptible — full score
Delta E 2-5: Barely noticeable — minor deduction
Delta E 5-10: Noticeable — moderate deduction
Delta E > 10: Clearly wrong — major deduction

Typography Extraction

Typography is checked by reading computed styles, not pixels:

javascript

// Extract from running implementation
const element = document.querySelector('.hero-title')
const styles = window.getComputedStyle(element)

{
  fontFamily: styles.fontFamily,        // Compare against reference
  fontSize: styles.fontSize,            // "32px" vs expected "36px"
  fontWeight: styles.fontWeight,        // "400" vs expected "700"
  lineHeight: styles.lineHeight,        // "1.5" vs expected "1.25"
  letterSpacing: styles.letterSpacing   // "normal" vs expected "-0.02em"
}

Common Visual Bugs Caught

Font weight 400 instead of 700 (bold not applying)
Tailwind class not included in safelist (purged in build)
Media query breakpoint uses px when rem expected
Hover state not added after copy-paste from design
Icon imported but wrong size prop
Color defined in CSS variable but variable not set in theme
Padding/margin using px-3 when design shows px-4
Z-index issue hiding interactive element
Border-radius value wrong (rounded vs rounded-lg vs rounded-full)
Line-clamp not applied, text overflows on mobile

Saving Screenshots for Regression History

After a PASS verdict, save the screenshot as the new baseline:

bash

cp /tmp/verdict-current-{timestamp}.png \
   ~/.claude/benchmarks/screenshots/{component-name}-{date}.png

On the next change, this saved screenshot becomes the reference for regression comparison.

Remember: A visual that looks "pretty close" in a quick glance often has 3-5 compounding issues that add up to a broken experience. Score it. Trust the number, not the impression.

Maintainer

vibeeval Core maintainer

Source details

Full Name: vibeeval/vibecosystem
Branch: main
Path in repo: skills/visual-verdict
License: MIT License
Topics: claude-code anthropic claude automation ai-agents developer-tools claude-skills vibe-coding vibecoding multi-agent hooks devtools self-learning open-source ai-software-team

Featured Tools

Join Our Newsletter

Write reliable prompts for Agentica/REPL agents that avoid LLM instruction ambiguity

458 38

Explore

Didn't find tool you were looking for?

Search AI Tools