Agent skill
qa-browser-automation
Production-grade browser QA automation with visual regression testing, accessibility auditing, performance profiling, and intelligent bug triage
Install this agent skill to your Project
npx add-skill https://github.com/borghei/Claude-Skills/tree/main/engineering/qa-browser-automation
Metadata
Additional technical details for this skill
- tags
-
browser-qa wcag visual-regression health-scoring
- author
- borghei
- domain
- quality-assurance
- updated
- 1773792000
- version
- 2.0.0
- category
- engineering
- tech stack
- python, chrome-mcp, accessibility, wcag, performance
- python tools
- qa_health_scorer.py, accessibility_auditor.py, visual_regression_tracker.py, test_report_generator.py
SKILL.md
QA Browser Automation
The most comprehensive browser QA skill available for AI coding assistants. Combines live Chrome MCP browser control with deterministic Python analysis tools to deliver systematic, repeatable quality assurance across any web application.
What sets this apart: Four testing tiers, 10-category weighted health scoring, five severity levels, WCAG 2.1 AAA coverage, visual regression tracking, Core Web Vitals profiling, and full Python automation — all integrated with live browser interaction via Chrome MCP.
Keywords
browser-testing, qa-automation, visual-regression, accessibility-audit, wcag-compliance, performance-profiling, core-web-vitals, health-scoring, bug-triage, chrome-mcp, cross-browser, responsive-testing, e2e-testing, smoke-testing, regression-testing
Table of Contents
- Quick Start
- Core Workflows
- 1. Full Application QA Sweep
- 2. Visual Regression Testing
- 3. Accessibility Compliance Audit
- 4. Performance Profiling
- 5. Diff-Aware QA
- Tools
- Reference Guides
- Testing Tiers
- Health Scoring System
- Bug Severity Classification
- Integration Points
Quick Start
- Navigate to target application using Chrome MCP (
mcp__claude-in-chrome__navigate) - Choose a testing tier — Quick (30s), Standard (2-5min), Deep (10-20min), or Exhaustive (30min+)
- Run the appropriate workflow from the Core Workflows section below
- Generate report using
test_report_generator.pywith collected findings
# Score findings after a QA session
python scripts/qa_health_scorer.py findings.json
# Audit a page for accessibility
python scripts/accessibility_auditor.py page.html --level AA
# Track visual regressions
python scripts/visual_regression_tracker.py --baseline baselines/ --current screenshots/
# Generate full report
python scripts/test_report_generator.py session_data.json --format markdown -o report.md
Core Workflows
1. Full Application QA Sweep (11-Phase Protocol)
Fully prescriptive, phase-gated QA workflow. Each phase must complete before the next begins.
Phase 1 — Pre-Flight
- Verify
git statusis clean (no uncommitted changes). Abort if dirty. - Create session directory:
.qa-sessions/{timestamp}/ - Record starting branch, commit hash, and timestamp
- Check if a previous baseline exists for regression comparison
Phase 2 — Authenticate
- If the application requires login, handle authentication first
- Use
mcp__claude-in-chrome__form_inputto fill credentials - Verify session established via
mcp__claude-in-chrome__read_console_messages - Store auth state for subsequent phases
Phase 3 — Orient
- Use
mcp__claude-in-chrome__read_pageto capture the sitemap or navigation structure - Enumerate all unique routes, modals, and dynamic views
- Identify authentication gates and role-based views
- Detect framework (React, Vue, Next.js, etc.) from page source
- Build the page map — this drives all subsequent testing
Phase 4 — Systematic Exploration
- Navigate each route with
mcp__claude-in-chrome__navigate - Check
mcp__claude-in-chrome__read_console_messagesfor errors and warnings - Verify all pages render without HTTP 4xx/5xx via
mcp__claude-in-chrome__read_network_requests - Test all forms with
mcp__claude-in-chrome__form_input— valid data, empty submissions, boundary values - Exercise interactive elements: dropdowns, modals, tabs, accordions, tooltips
- Verify CRUD operations complete successfully
- Test navigation flows: login, onboarding, checkout, multi-step wizards
Phase 5 — State Testing
- Verify loading states (skeleton screens, spinners — not blank pages)
- Check empty states (no data, first-time user — must guide to first action)
- Trigger error states (invalid input, network failure simulation)
- Confirm success states (toast notifications, redirects, confirmation screens)
- Test partial states (incomplete data, pagination boundaries, stale cache)
- Four shadow paths per interaction: happy path, nil input, empty input, error upstream
Phase 6 — Cross-Device & Security
- Use
mcp__claude-in-chrome__resize_windowto test at 320px, 768px, 1024px, 1440px, 1920px - Verify responsive breakpoints, touch targets (44x44px minimum), and layout shifts
- Check security headers via network requests (CSP, HSTS, X-Frame-Options)
- Test for open redirects, XSS reflection in URL params
- Verify CSRF tokens on forms, cookie flags (Secure, HttpOnly, SameSite)
Phase 7 — Document
- Record every finding immediately with screenshot evidence
- Use
mcp__claude-in-chrome__computerto capture visual state - Classify each finding by severity (P0-P4) and category (10 categories)
- Save findings incrementally to
.qa-sessions/{timestamp}/findings.json - Rule: No finding exists without evidence. Screenshots are mandatory.
Phase 8 — Score
- Run
python scripts/qa_health_scorer.py findings.jsonto compute health score - If baseline exists, include
--baseline .qa-baselines/latest.jsonfor trend comparison - Record score in session artifacts
Phase 9 — Triage & Fix Loop
- Sort findings by severity (P0 first, P4 last)
- For each finding (respecting safety controls — see Safety Controls section):
- P3/P4: AUTO-FIX — apply fix, commit atomically, verify
- P0/P1/P2: ASK — present finding with evidence, propose fix, wait for approval
- After each fix: re-run the specific check to verify the fix works
- If fix fails verification:
git revertand move to next finding
- Hard stop at 50 fixes regardless of remaining findings
Phase 10 — Regression Check
- Re-visit pages affected by fixes
- Verify no new console errors, broken links, or visual regressions
- Run
mcp__claude-in-chrome__read_console_messagesandread_network_requestson fixed pages - If new P0/P1 found: revert the causing commit and flag
Phase 11 — Report & Baseline Update
- Generate comprehensive report:
python scripts/test_report_generator.py session.json - Save health score as new baseline:
--save-baseline - Output: session directory with findings, scores, screenshots, fixes log, and final report
- Print summary: score, grade, findings by severity, fixes applied, regressions (if any)
2. Visual Regression Testing
Before/after screenshot comparison to catch unintended visual changes.
Setup Baseline
# Initialize baseline manifest
python scripts/visual_regression_tracker.py --init --baseline-dir ./baselines
Capture Baselines
- Use
mcp__claude-in-chrome__upload_imageor screenshot tools to capture each page - Store screenshots organized by route:
baselines/home.png,baselines/dashboard.png - Register in manifest:
python scripts/visual_regression_tracker.py --register baselines/
Run Comparison
# After code changes, capture new screenshots and compare
python scripts/visual_regression_tracker.py --baseline baselines/ --current screenshots/ --threshold 5
Review Diffs
- Pages exceeding the threshold (default 5%) are flagged as regressions
- Review diff report to accept intentional changes or file bugs for unintended ones
- Update baselines for accepted changes:
--update-baseline
3. Accessibility Compliance Audit
WCAG 2.1 compliance checking across three conformance levels.
Automated Checks
# Get page HTML via Chrome MCP, save to file, then audit
python scripts/accessibility_auditor.py page.html --level AA --json
What Gets Checked
- Level A (Must Fix): Alt text, page language, form labels, heading presence, duplicate IDs, auto-playing media
- Level AA (Should Fix): Color contrast (4.5:1 text, 3:1 large), heading hierarchy, focus visible, error identification, resize to 200%
- Level AA (Should Fix): Link purpose, consistent navigation, input purpose
- Level AAA (Nice to Have): Enhanced contrast (7:1), sign language, extended audio, reading level
Browser-Assisted Checks
- Use
mcp__claude-in-chrome__javascript_toolto run focus-order tests - Tab through all interactive elements to verify keyboard accessibility
- Check ARIA roles and live regions with JS inspection
Reporting
- Each violation includes: WCAG criterion, severity, element selector, remediation guidance
- Summary shows compliance percentage per level
4. Performance Profiling
Core Web Vitals measurement and network analysis.
Capture Metrics
- Use
mcp__claude-in-chrome__read_network_requeststo capture waterfall data - Use
mcp__claude-in-chrome__javascript_toolto extract performance timing:javascriptJSON.stringify(performance.getEntriesByType('navigation')[0]) - Measure CLS, LCP, FID/INP from Performance Observer data
Analyze Results
- Compare against thresholds in
references/performance_benchmarks.md - Identify blocking resources, excessive bundle sizes, unoptimized images
- Check for memory leaks via heap snapshot comparison
- Verify caching headers on static assets
Mobile Performance
- Resize to mobile viewport and re-measure
- Check for lazy loading on below-fold images
- Verify touch responsiveness and input latency
5. Diff-Aware QA
Git-based change detection for targeted, efficient testing.
Step 1 — Detect Changes
git diff --name-only main...HEAD
Step 2 — Map Changes to Routes
- Component file changes map to specific pages/routes
- API changes map to features consuming those endpoints
- Style changes map to visual regression candidates
- Config changes trigger broader smoke testing
Step 3 — Targeted Testing
- Only test routes affected by the diff
- Run visual regression on changed pages only
- Accessibility audit on modified components
- Full suite if infrastructure files changed (webpack, package.json, CI config)
Step 4 — Risk Assessment
- Changes to auth/payment/data-mutation get automatic Deep tier
- Style-only changes get Quick tier visual regression
- New routes get Standard tier full workflow
Tools
QA Health Scorer — scripts/qa_health_scorer.py
Computes a weighted health score (0-100) from QA findings across 10 categories.
# Basic scoring
python scripts/qa_health_scorer.py findings.json
# JSON output for CI integration
python scripts/qa_health_scorer.py findings.json --json
# Compare against baseline
python scripts/qa_health_scorer.py findings.json --baseline previous_score.json
# Set custom passing threshold
python scripts/qa_health_scorer.py findings.json --threshold 80
Accessibility Auditor — scripts/accessibility_auditor.py
Analyzes HTML for WCAG 2.1 violations across all three conformance levels.
# Audit at AA level (default)
python scripts/accessibility_auditor.py page.html
# Audit at AAA level with JSON output
python scripts/accessibility_auditor.py page.html --level AAA --json
# Audit from stdin (pipe from curl)
curl -s https://example.com | python scripts/accessibility_auditor.py - --level A
Visual Regression Tracker — scripts/visual_regression_tracker.py
Manages screenshot baselines and detects visual regressions between test runs.
# Initialize baseline directory
python scripts/visual_regression_tracker.py --init --baseline-dir ./baselines
# Register screenshots as baselines
python scripts/visual_regression_tracker.py --register ./baselines
# Compare current against baseline
python scripts/visual_regression_tracker.py --baseline ./baselines --current ./screenshots
# Custom threshold (default 5%)
python scripts/visual_regression_tracker.py --baseline ./baselines --current ./screenshots --threshold 3
# Update baseline with current screenshots
python scripts/visual_regression_tracker.py --update-baseline --baseline ./baselines --current ./screenshots
Test Report Generator — scripts/test_report_generator.py
Generates comprehensive QA reports from session data.
# Markdown report (default)
python scripts/test_report_generator.py session_data.json
# JSON summary
python scripts/test_report_generator.py session_data.json --format json
# Write to file
python scripts/test_report_generator.py session_data.json --format markdown -o report.md
# Include trend data
python scripts/test_report_generator.py session_data.json --history scores_history.json
Reference Guides
| Guide | Location | Content |
|---|---|---|
| Browser Testing Methodology | references/browser_testing_methodology.md |
Page exploration strategies, element interaction patterns, state testing, auth flows |
| WCAG Compliance Guide | references/wcag_compliance_guide.md |
WCAG 2.1 A/AA/AAA requirements, common violations, testing techniques |
| Performance Benchmarks | references/performance_benchmarks.md |
Core Web Vitals thresholds, network analysis, memory profiling, mobile considerations |
Testing Tiers
Quick (30 seconds)
- Console error check on current page
- Broken link scan (current page only)
- Basic accessibility check (alt text, headings)
- Viewport resize to mobile and back
Standard (2-5 minutes)
- All Quick checks plus:
- Navigate top 5-10 routes, check console and network
- Form validation on primary forms
- Heading hierarchy and color contrast audit
- Core Web Vitals capture on landing page
Deep (10-20 minutes)
- All Standard checks plus:
- Full sitemap traversal
- State testing (empty, error, loading, success, partial)
- Complete WCAG AA audit
- Performance profiling on 3 key pages
- Visual regression on changed pages
- Security header verification
Exhaustive (30+ minutes)
- All Deep checks plus:
- Every interactive element exercised
- WCAG AAA audit
- Performance profiling on all pages
- Full visual regression suite
- Cross-device testing at 5 breakpoints
- Authentication flow edge cases
- Third-party integration verification
- Memory leak detection via repeated navigation
Health Scoring System
Score range: 0-100 computed from 10 weighted categories.
| Category | Weight | What It Measures |
|---|---|---|
| Console Errors | 12% | JavaScript errors, unhandled rejections, deprecation warnings |
| Broken Links | 8% | HTTP 4xx/5xx responses, dead anchors, missing assets |
| Visual Consistency | 10% | Layout shifts, overflow, alignment, z-index issues |
| Functional | 18% | Forms work, CRUD operations complete, navigation flows succeed |
| UX Flow | 12% | Logical navigation, clear feedback, expected behavior |
| Performance | 12% | Core Web Vitals within thresholds, fast load times |
| Content Quality | 5% | Spelling, placeholder text, lorem ipsum, truncation |
| Accessibility | 13% | WCAG compliance, keyboard navigation, screen reader support |
| Security Headers | 5% | CSP, HSTS, X-Frame-Options, cookie flags |
| Mobile Responsive | 5% | Breakpoints work, touch targets adequate, no horizontal scroll |
Grading Scale:
- A (90-100): Production-ready, no critical issues
- B (80-89): Ship with minor fixes planned
- C (70-79): Needs attention before release
- D (60-69): Significant issues, delay recommended
- F (0-59): Critical failures, do not ship
Deduction System by Severity:
- P0 Critical: -30 points per finding
- P1 High: -18 points per finding
- P2 Medium: -10 points per finding
- P3 Low: -4 points per finding
- P4 Cosmetic: -1 point per finding
Deductions are distributed proportionally across their applicable categories. Score floors at 0.
Bug Severity Classification
P0 — Critical
Application crash, data loss, security vulnerability, payment failure, complete feature broken. Must fix before any release. Examples: white screen of death, XSS vulnerability, checkout sends wrong amount, auth bypass.
P1 — High
Major feature partially broken, significant UX degradation, accessibility blocker, performance regression >50%. Must fix within current sprint. Examples: form silently drops data, keyboard users cannot complete core flow, LCP >8s.
P2 — Medium
Feature works but with friction, moderate visual issues, accessibility violation (AA), performance below threshold. Fix within next 2 sprints. Examples: date picker requires manual format, contrast ratio 3.5:1 on body text, CLS >0.25.
P3 — Low
Minor inconvenience, cosmetic issue with workaround, accessibility nice-to-have, slight performance gap. Backlog prioritization. Examples: tooltip misaligned by 2px on hover, alt text could be more descriptive, TTFB 900ms.
P4 — Cosmetic
Purely visual polish, no functional impact, enhancement opportunity. Fix when convenient. Examples: inconsistent border-radius across cards, font-weight 500 vs 600 inconsistency, extra whitespace in footer.
Safety Controls & Self-Regulation
Production QA requires guardrails to prevent runaway fixes from destabilizing the codebase.
Fix Session Limits
- Maximum 50 fixes per session — hard stop. After 50 fixes, generate report and exit regardless of remaining findings.
- Risk accumulator — each fix increments a risk score: component file changes (+5), style changes (+2), config changes (+8), reverts (+15). Stop if cumulative risk exceeds 25% of total risk budget (100).
- Revert protocol — if a fix introduces a new P0 or P1 finding (verified by re-running the affected check), immediately
git revertthe commit and flag for manual review. - WTF-likelihood heuristic — if 3 consecutive fixes fail verification after commit, stop the fix loop entirely and report. The codebase likely has a systemic issue that individual fixes cannot address.
Pre-Conditions
- Clean working tree required — refuse to start if
git statusshows uncommitted changes. This ensures every fix is a clean, revertible commit. - Branch verification — warn if running on
mainormaster. QA fix sessions should run on feature branches.
Atomic Commits
Every fix produces exactly one commit:
fix(qa): [P{severity}] {short description}
Finding: {original finding description}
Evidence: {screenshot reference or console output}
Verified: {pass|fail} after fix applied
Interaction Model
- AUTO-FIX (no confirmation): P3 (Low) and P4 (Cosmetic) — spacing, typos, minor style fixes
- ASK (requires confirmation): P0, P1, P2 — structural changes, logic fixes, accessibility remediation
- One issue = one question — never batch multiple findings into a single prompt. Each fix decision is independent.
- Rollback instruction — every ASK includes: what changes, why, evidence, and exact
git revert <hash>command
State Persistence & Trend Tracking
Baseline Management
Save health scores after each session for regression comparison:
# Save current score as baseline
python scripts/qa_health_scorer.py findings.json --save-baseline
# Compare against saved baseline
python scripts/qa_health_scorer.py findings.json --baseline .qa-baselines/latest.json
Storage: .qa-baselines/{YYYY-MM-DD}.json — contains score, grade, category breakdown, finding counts, timestamp.
Regression Mode
Compare current run against a saved baseline to detect regressions:
- Run full QA sweep → generate findings JSON
- Score findings with
--baselineflag pointing to previous run - Report delta: categories that improved, degraded, or held steady
- Flag any category that dropped >10 points as a regression warning
Session Artifacts
Each QA session creates a directory: .qa-sessions/{timestamp}/
findings.json— all findings from this sessionhealth_score.json— scored resultsscreenshots/— evidence screenshots (if using Chrome MCP)report.md— generated markdown reportfixes.log— list of commits made during fix loop
Trend Dashboard
After 3+ sessions, the scorer can generate trend analysis:
- Week-over-week health score trajectory
- Most frequently failing categories
- Persistent findings that recur across sessions
- Estimated time to reach target score
Integration Points
| Skill | Integration |
|---|---|
code-reviewer |
Feed QA findings into PR review context for informed approval decisions |
senior-frontend |
Visual regression baselines align with component library standards |
senior-devops |
Health scores gate CI/CD deployment pipelines (threshold check) |
senior-secops |
Security header findings escalate to security review workflow |
incident-commander |
P0 findings trigger incident response if found in production |
senior-qa |
Extends manual QA checklist with automated browser verification |
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Health scorer exits with code 1 but no errors printed | Score fell below the --threshold value (default 70) |
Check the score in the report output; raise with --threshold 50 if intentional, or fix findings to increase the score |
Accessibility auditor reports parse-error violation |
Malformed or truncated HTML fed to the auditor | Ensure the HTML file is complete and well-formed; if piping from curl, verify the response is not a redirect or error page |
| Visual regression tracker shows 100% change on all pages | Baseline manifest is empty or was never initialized | Run --init --baseline-dir ./baselines followed by --register ./baselines before comparing |
Visual regression reports baseline_missing for known pages |
Screenshot filenames changed between runs (e.g., route slug renamed) | Re-register baselines with --register after renaming, or use --update-baseline to refresh from current screenshots |
| Findings JSON loads but all findings default to P3/functional | Finding objects missing severity or category keys |
Ensure each finding dict includes "severity": "P0"-"P4" and "category" matching one of the 10 scoring categories |
| Test report generator produces empty Findings section | Session JSON has findings at the top level instead of under a "findings" key |
Structure the session JSON with a "findings" array; see the expected schema in test_report_generator.py docstring |
Chrome MCP read_page returns stale content after SPA navigation |
Single-page app updated the DOM without a full page load | Wait for the SPA transition to complete, then call mcp__claude-in-chrome__read_page again; use read_console_messages to confirm the route change landed |
Success Criteria
- Health score above 85/100 on the target application after a Standard-tier sweep, indicating ship-ready quality with only minor issues remaining.
- Zero P0 (Critical) findings at the end of the QA session; any P0 discovered during the sweep must be resolved or escalated before the session closes.
- WCAG AA compliance at or above 95% as reported by
accessibility_auditor.py, with zeromust-fixviolations remaining. - Visual regression pass rate of 100% against the established baseline at the configured threshold (default 5%), confirming no unintended visual changes.
- All Core Web Vitals within "Good" thresholds — LCP under 2.5s, CLS below 0.1, INP under 200ms — on at least the three highest-traffic pages.
- Fewer than 5 P2 (Medium) findings remaining after the triage-and-fix loop, demonstrating that functional friction has been addressed.
- Trend line stable or improving across consecutive sessions; no category drops more than 10 points compared to the previous baseline.
Scope & Limitations
This skill covers:
- End-to-end browser QA via Chrome MCP: navigation, form interaction, console monitoring, network inspection, responsive testing, and screenshot capture.
- Static HTML accessibility auditing against WCAG 2.1 levels A, AA, and AAA using deterministic Python checks (no external services required).
- Visual regression tracking through file-hash comparison and byte-level diff analysis with configurable thresholds.
- Weighted health scoring across 10 quality categories with severity-based deductions, baseline trend tracking, and CI-friendly exit codes.
This skill does NOT cover:
- Cross-browser testing beyond Chrome (Safari, Firefox, Edge). For multi-browser matrix testing, integrate with
senior-devopsCI pipeline skills. - Pixel-perfect image diffing or perceptual hashing — the visual regression tracker uses byte-level comparison, not computer-vision-based diffing. For advanced visual AI comparison, pair with
senior-computer-vision. - Backend API testing, database validation, or load/stress testing. Use
senior-backendfor API contract verification andsenior-devopsfor load testing infrastructure. - Runtime color contrast computation from rendered CSS. The accessibility auditor flags inline-style risk patterns and recommends manual verification; it does not compute contrast ratios from computed styles.
Integration Points
| Skill | Integration | Data Flow |
|---|---|---|
code-reviewer |
Feed the health score and findings summary into PR review context so reviewers can make informed approval decisions | QA session report.md or --json output attached to the PR body or review comment |
senior-frontend |
Visual regression baselines align with component library standards; baseline updates happen alongside design system releases | visual_regression_tracker.py baseline directory shared in the component library repo |
senior-devops |
Health score gates CI/CD deployment pipelines via the scorer's non-zero exit code on threshold failure | qa_health_scorer.py --threshold 80 --json runs as a pipeline step; exit code 1 blocks deploy |
senior-secops |
Security header findings (CSP, HSTS, X-Frame-Options) from the QA sweep escalate to the security review workflow | P0/P1 findings with category: security_headers forwarded to the secops triage queue |
incident-commander |
P0 findings discovered on production URLs trigger the incident response protocol | P0 finding JSON payload sent to the incident channel with evidence screenshots |
senior-qa |
Extends manual QA checklists with automated browser verification; manual testers review automated findings and add exploratory context | test_report_generator.py markdown report used as the starting point for manual QA sign-off |
Tool Reference
qa_health_scorer.py
Purpose: Computes a weighted health score (0-100) from QA findings across 10 categories. Produces a letter grade (A-F), supports trend tracking against previous baselines, and returns a non-zero exit code when the score falls below the passing threshold.
Usage:
python scripts/qa_health_scorer.py <findings_file> [options]
Parameters:
| Flag / Argument | Type | Required | Default | Description |
|---|---|---|---|---|
findings_file |
positional | Yes | — | Path to a JSON file containing QA findings (array of finding objects, or an object with a "findings" key) |
--json |
flag | No | off | Output results as machine-readable JSON instead of the human-readable text report |
--baseline |
string | No | None |
Path to a previous score JSON file for trend comparison (computes delta and direction) |
--threshold |
int | No | 70 |
Minimum passing score; the tool exits with code 1 if the score falls below this value |
--save-baseline |
flag | No | off | Save the current score to .qa-baselines/{YYYY-MM-DD}.json and .qa-baselines/latest.json for future trend comparison |
Example:
# Score findings with an 85-point threshold, compare against last run, save result as new baseline
python scripts/qa_health_scorer.py findings.json --threshold 85 --baseline .qa-baselines/latest.json --save-baseline --json
Output Formats:
- Human-readable (default): Tabular report with overall score, grade, pass/fail status, severity breakdown, category breakdown with weights/scores/findings, and priority areas for categories scoring below 70%.
- JSON (
--json): Object with keysoverall_score,grade,passed,threshold,timestamp,severity_summary,total_findings,categories(per-category weight, score_pct, deductions, finding_counts), and optionaltrend.
accessibility_auditor.py
Purpose: Analyzes HTML content for WCAG 2.1 violations across conformance levels A, AA, and AAA. Detects missing alt text, page language, heading hierarchy issues, duplicate IDs, unlabeled form inputs, empty link text, media without captions, autoplay media, missing landmark regions, positive tabindex values, focus indicator removal, and inline color contrast risk patterns. Returns a non-zero exit code when must-fix violations are present.
Usage:
python scripts/accessibility_auditor.py <html_file> [options]
Parameters:
| Flag / Argument | Type | Required | Default | Description |
|---|---|---|---|---|
html_file |
positional | Yes | — | Path to an HTML file to audit; use "-" to read from stdin |
--level |
choice | No | AA |
WCAG conformance level to check: A, AA, or AAA |
--json |
flag | No | off | Output results as JSON instead of the human-readable text report |
Example:
# Audit a page at AAA level, output JSON for downstream processing
curl -s https://example.com | python scripts/accessibility_auditor.py - --level AAA --json
Output Formats:
- Human-readable (default): Report with level checked, elements checked, total violations, compliance percentage, violations broken down by level and severity, and a numbered list of each violation with rule ID, WCAG criterion, issue description, element, and remediation guidance.
- JSON (
--json): Object with keyslevel_checked,total_elements_checked,total_violations,compliance_percentage,by_level,by_severity, andviolations(array of objects each containingrule_id,wcag_criterion,level,severity,message,element,selector_hint,remediation).
visual_regression_tracker.py
Purpose: Manages screenshot baselines and detects visual regressions between test runs. Maintains a JSON manifest of baseline screenshots with SHA-256 file hashes. Compares current screenshots against baselines using byte-level analysis and flags pages exceeding a configurable change threshold. Returns a non-zero exit code when regressions are detected.
Usage:
python scripts/visual_regression_tracker.py [action] [options]
Parameters:
| Flag / Argument | Type | Required | Default | Description |
|---|---|---|---|---|
--init |
flag | No | off | Initialize a new baseline directory with an empty manifest |
--register |
string (DIR) | No | — | Scan the given directory and register all image files (png, jpg, jpeg, bmp, gif, webp) in the baseline manifest |
--update-baseline |
flag | No | off | Copy current screenshots into the baseline directory and update the manifest |
--baseline-dir / --baseline |
string | Conditional | — | Path to the baseline screenshot directory (required for --init, comparison, and --update-baseline) |
--current |
string | Conditional | — | Path to the current screenshot directory (required for comparison and --update-baseline) |
--threshold |
float | No | 5.0 |
Change percentage threshold above which a page is flagged as a regression |
--json |
flag | No | off | Output results as JSON instead of the human-readable text report |
Example:
# Initialize, register baselines, then compare with a tight 2% threshold
python scripts/visual_regression_tracker.py --init --baseline-dir ./baselines
python scripts/visual_regression_tracker.py --register ./baselines
python scripts/visual_regression_tracker.py --baseline ./baselines --current ./screenshots --threshold 2 --json
Output Formats:
- Human-readable (default): Report with timestamp, threshold, counts of compared/passed/failed/new/missing pages, overall pass/fail result, and per-page status with change percentages.
- JSON (
--json): Object with keystimestamp,threshold,baseline_dir,current_dir,pages(per-page status, change_pct, hashes, sizes), andsummary(total_compared, passed, failed, new_pages, missing_pages).
test_report_generator.py
Purpose: Generates comprehensive QA reports from session data. Consumes a JSON file containing findings, health scores, accessibility results, performance metrics, and visual regression data, then produces a detailed markdown or JSON report with executive summary, health score dashboard, category breakdown, findings grouped by severity, accessibility and performance sections, visual regression results, and prioritized recommendations.
Usage:
python scripts/test_report_generator.py <session_file> [options]
Parameters:
| Flag / Argument | Type | Required | Default | Description |
|---|---|---|---|---|
session_file |
positional | Yes | — | Path to the QA session data JSON file (expected keys: project, url, tester, tier, findings, and optionally health_score, accessibility, performance, visual_regression, screenshots, notes) |
--format |
choice | No | markdown |
Output format: markdown or json |
-o / --output |
string | No | stdout | Write the report to the specified file path instead of printing to stdout |
--history |
string | No | None |
Path to a score history JSON file for trend analysis (array of objects with score or overall_score keys) |
Example:
# Generate a markdown report with trend data, written to a file
python scripts/test_report_generator.py session_data.json --format markdown --history .qa-baselines/history.json -o reports/qa-report-2026-03-21.md
Output Formats:
- Markdown (default): Full report with header (project, URL, tester, tier), executive summary, health score dashboard with category breakdown table, optional trend section, findings grouped by severity with details (title, category, location, description, steps, expected/actual), accessibility results, performance metrics table against thresholds, visual regression results, numbered recommendations, optional notes, and timestamped footer.
- JSON (
--format json): Object with keysreport_type,generated,project,url,tier,health_score,grade,passed,total_findings,findings_by_severity,findings_by_category,accessibility_violations,accessibility_compliance_pct,visual_regressions,performance_metrics,trend, andrecommendations.
Last Updated: 2026-03-21 Version: 2.1.0
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
churn-prevention
SaaS churn reduction covering cancel flow design, dynamic save offers, exit survey architecture, dunning sequences, payment recovery, win-back campaigns, and churn impact modeling.
popup-cro
Popup and modal optimization for conversion. Covers exit-intent, slide-ins, banners, timing optimization, frequency capping, audience targeting, compliance, and A/B testing frameworks for lead capture, promotions, and announcements.
competitor-alternatives
Competitor comparison and alternative page creation for SEO and sales enablement. Covers 4 page formats (singular alternative, plural alternatives, vs pages, competitor vs competitor), content architecture, research methodology, and centralized competitor data management.
contract-and-proposal-writer
Generate production-ready business documents including freelance contracts, project proposals, SOWs, NDAs, and MSAs with jurisdiction-aware clauses. Covers US (Delaware), EU (GDPR), UK, and DACH (German law) legal frameworks. Includes contract templates, clause libraries, and DOCX conversion. Use when starting client engagements, writing proposals, drafting partnership agreements, or needing GDPR-compliant data processing addenda.
pricing-strategy
SaaS pricing design and optimization covering value metric selection, tier architecture, price point research, pricing page design, price increase execution, and competitive pricing analysis.
referral-program
Referral and affiliate program design covering referral loop architecture, incentive design, trigger moment optimization, viral coefficient modeling, affiliate program structure, and optimization playbook.
Didn't find tool you were looking for?