Agent skill

metrics-report

Scan an entire codebase, discover and run all test types, compute hybrid coverage, evaluate quality, and generate a full metrics report website with trends and charts.

View SKILL.md on GitHub Repository

Stars 15

Forks 7

Install this agent skill to your Project

npx add-skill https://github.com/diegopacheco/ai-playground/tree/main/pocs/metrics-report-skill

SKILL.md

Metrics Report Skill

You are a metrics analysis agent. When invoked, scan the codebase, discover tests, run them, compute coverage, evaluate quality, and produce a JSON report for the metrics React application.

IMPORTANT: Optimize for speed. Use Agent tool to parallelize independent work. Use Grep/Glob instead of reading files when possible. Batch shell commands.

Progress Reporting (MANDATORY)

You MUST print progress to the user at every phase transition and every test type execution. Use this exact format:

Phase transitions: [Phase X/7] <phase name>...
Agent 3 test steps: [Test X/Y] Running <type> tests (<backend|frontend>)... then [Test X/Y] <type> tests: <N> passed, <M> failed
Phase completion: [Phase X/7] Done. <one-line summary of result>

This is NON-NEGOTIABLE. The user must always know what is happening and where we are in the process. Never go silent for more than one tool call without printing status.

Phase 0: Install Metrics Report Site

Print: [Phase 0/7] Installing metrics report site...

Check if metrics-report/metrics-application/ exists. If NOT, copy it from the skill installation directory and run npm install:

bash

SKILL_DIR="$HOME/.claude/skills/metrics-report"
if [ ! -d "metrics-report/metrics-application" ]; then
  mkdir -p metrics-report
  cp -r "$SKILL_DIR/metrics-report/"* metrics-report/
  cd metrics-report/metrics-application && npm install && cd ../..
fi

If metrics-report/metrics-application/ already exists but node_modules is missing, just run npm install:

bash

if [ -d "metrics-report/metrics-application" ] && [ ! -d "metrics-report/metrics-application/node_modules" ]; then
  cd metrics-report/metrics-application && npm install && cd ../..
fi

Print: [Phase 0/7] Done. Metrics report site ready.

Phase 1: Setup and Detection (do all at once)

Print: [Phase 1/7] Setup and detection...

Run these in a SINGLE Bash call chained with &&:

bash

git remote get-url origin 2>/dev/null || echo ""; git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "main"; git rev-parse --short HEAD 2>/dev/null || echo "unknown"

Check if metrics-report/metrics-config.json exists. If not, create it:

json

{
  "port": 3737,
  "testTypes": {
    "unit": true,
    "integration": true,
    "contract": true,
    "e2e": true,
    "css": false,
    "stress": false,
    "chaos": false,
    "mutation": false,
    "observability": true,
    "fuzzy": false,
    "propertybased": false
  },
  "githubRemote": "origin"
}

Detect stacks using Glob (one call, check results):

pom.xml or build.gradle or build.gradle.kts = Java/Spring Boot
package.json with react = React/Node
manage.py or django in pyproject.toml = Django/Python
Cargo.toml with actix/axum/tokio = Rust

Print: [Phase 1/7] Done. Detected stacks: <list stacks found>

Phase 2: Scan and Run Tests

Print: [Phase 2/7] Scanning codebase and running tests...

Launch 2 background agents for read-only scanning, then run tests DIRECTLY with Bash (no agent).

Background Agent 1: File Count and Test Discovery

Use Glob to count all source files (skip node_modules, target, dist, build, .git, pycache, venv, .venv, metrics-report). Split by frontend/backend using file extension and path.

Then use Grep to find all test files in ONE pass per pattern:

Grep pattern="@Test|@ParameterizedTest" for Java
Grep pattern="def test_|class Test" for Python  
Grep pattern="#\[test\]|#\[cfg\(test\)\]" for Rust
Grep pattern="describe\(|it\(|test\(" for JS/TS

Classify each test file by type using these Grep checks (batch them):

Integration: Grep for @SpringBootTest|@Testcontainers|@DataJpaTest|IntegrationTest|supertest across test files
E2E: Grep for @playwright/test|playwright across test files
Contract: Grep for Pact|Contract|pact across test files
Stress: Grep for k6/http|k6/ws or files matching *.k6.js
Chaos: Grep for chaos-monkey|litmus|toxiproxy across test files
Observability: Grep for MeterRegistry|prometheus|opentelemetry|actuator/health|tracing in test files
CSS: Grep for jest-image-snapshot|percy|chromatic|toMatchImageSnapshot in test files
Fuzzy: Grep for fuzz|Fuzz|FuzzTest|@FuzzTest|cargo-fuzz|go-fuzz|jazzer|atheris in test files
Property-Based: Grep for @Property|jqwik|QuickCheck|hypothesis|proptest|fast-check|fc\.property in test files
Everything else in test dirs = unit

For each test file, extract test method names and line numbers using Grep (not Read):

Java: Grep pattern="@Test" -A 1 to get method names
Python: Grep pattern="def test_"
Rust: Grep pattern="fn test_|#\[test\]" -A 1
JS/TS: Grep pattern="(it|test)\('"

Return: file counts, test file list with types and test names.

Background Agent 2: Git Attribution (batch)

Run a SINGLE bash command to get all test file authors at once:

bash

for f in $(find . -path '*/test*' -name '*.java' -o -name '*.py' -o -name '*.rs' -o -name '*.ts' -o -name '*.tsx' -o -name '*.js' | grep -v node_modules | grep -v target | grep -v dist); do echo "$f:$(git log --format='%an' --diff-filter=A -- "$f" 2>/dev/null | head -1)"; done

Return: map of file path to author.

Run Tests Directly (NO agent — use Bash directly)

CRITICAL: Do NOT delegate test execution to an agent. Run tests DIRECTLY using Bash tool calls from the main conversation. Agent overhead adds massive latency for fast-running tests.

First use Grep to quickly check which test types actually exist:

Grep for @SpringBootTest|@Testcontainers in backend test files for integration
Grep for Pact|Contract for contract tests
Grep for MeterRegistry|prometheus|actuator for observability tests
Grep for @playwright/test for e2e tests

Only run test types that have actual test files. Skip types with zero tests immediately.

Run each test type sequentially with Bash. Print progress before and after each:

[Test X/Y] Running <type> tests (<backend|frontend>)... [Test X/Y] <type> tests done: <N> passed, <M> failed (<duration>s)

Commands per stack:

Java (Maven): cd backend && ./mvnw test jacoco:report 2>&1 | tail -20 Java (Gradle): cd backend && ./gradlew test jacocoTestReport -q 2>&1 | tail -20 Scala (Maven): cd backend && ./mvnw test jacoco:report 2>&1 | tail -20 Scala (sbt): cd backend && sbt jacoco 2>&1 | tail -20 Python: cd backend && python -m pytest --tb=short -v --cov=. --cov-report=json 2>&1 | tail -30 Rust: cd backend && cargo test 2>&1 | tail -20 then cargo tarpaulin --out json 2>&1 if available Node: cd frontend && npx react-scripts test --watchAll=false --coverage 2>&1 | tail -40 E2E: cd frontend && npx playwright test 2>&1 | tail -20

Parse test results from output after each run. Parse coverage from report files:

Java/Scala (Maven): backend/target/site/jacoco/jacoco.csv
Scala (sbt): backend/target/scala-*/jacoco/report/jacoco.csv
Python: backend/coverage.json
Rust: backend/tarpaulin-report.json
Node: frontend/coverage/coverage-summary.json

When done, print: [Phase 2/7] Done. Found <N> test files, <M> source files, ran <K> test types.

Phase 3: LLM Coverage Mapping and Tool Coverage Parsing

Print: [Phase 3/7] Mapping test coverage to source files...

Step A: Parse tool coverage reports for per-file percentages

For backend files, parse the coverage report generated in Phase 2:

Java/Scala (JaCoCo CSV): Read backend/target/site/jacoco/jacoco.csv with Bash:

bash

cat backend/target/site/jacoco/jacoco.csv 2>/dev/null | tail -n +2 | awk -F',' '{missed=$8; covered=$9; total=missed+covered; pct=(total>0)? int(covered*100/total) : 0; print $3","pct}'

This gives CLASS_NAME,COVERAGE_PCT. Map each class name to its source file path.

Scala (sbt JaCoCo): Same CSV format at backend/target/scala-*/jacoco/report/jacoco.csv.

Python (coverage.json): Read with Bash:

bash

python3 -c "import json,sys; d=json.load(open('backend/coverage.json')); [print(f'{k},{int(v[\"summary\"][\"percent_covered\"])}') for k,v in d.get('files',{}).items()]" 2>/dev/null

Rust (tarpaulin): Read with Bash:

bash

cat backend/tarpaulin-report.json 2>/dev/null | python3 -c "import json,sys; d=json.load(sys.stdin); [print(f'{f[\"path\"]},{int(f[\"coverable\"] and f[\"covered\"]*100/f[\"coverable\"] or 0)}') for f in d.get('files',[])]"

Node/React (coverage-summary.json): Read with Bash:

bash

cat frontend/coverage/coverage-summary.json 2>/dev/null | python3 -c "import json,sys; d=json.load(sys.stdin); [print(f'{k},{int(v[\"lines\"][\"pct\"])}') for k,v in d.items() if k != 'total']"

If the coverage report file does not exist (tool not available), set tool: null and proceed to Step B for LLM mapping.

If the coverage report IS available, populate tool with the parsed percentage for each source file. Files not in the report get tool: 0.

Step B: LLM import mapping (for files missing tool coverage)

For each test file, use Grep to extract imports:

Grep pattern="^import " path="test-file.java"

Map imports to source files. For E2E tests, Grep for route URLs (page.goto, http.get) and map to controllers. Keep it to direct imports only — no full call chains.

Set llm: true on every source file that is directly imported by at least one test file.

Building the coverage array

For EVERY backend source file and EVERY frontend source file, create a coverage entry. Each entry must include ALL test types that have tests (not just unit). For test types with no tests, omit them from the coverage object.

If a file has tool coverage percentage: {"tool": <pct>, "llm": <true if also import-mapped>} If a file has only LLM mapping: {"tool": null, "llm": true} If a file has neither: {"tool": 0, "llm": false} when tool reports exist, or {"tool": null, "llm": false} when no tool report exists.

Print: [Phase 3/7] Done. Mapped <N> test files to <M> source files.

Phase 4: Quality Evaluation (fast)

Print: [Phase 4/7] Evaluating test quality...

For each test type that has tests, read ONLY 3 representative test files (not 10). Evaluate quickly:

Assertion quality (meaningful vs trivial)
Edge case coverage
Test naming quality

Rate: poor/fair/good/excellent with one sentence justification.

Print: [Phase 4/7] Done. Quality ratings: <list each type and its rating>

Phase 5: Compute Score and Generate JSON

Print: [Phase 5/7] Computing score and generating report JSON...

Score (0-10):

Criteria	Max	Calculation
Coverage breadth	3	(covered files / total source files) * 3
Type diversity	2	11 types=2.0, 9-10=1.7, 7-8=1.5, 5-6=1.2, 3-4=1.0, 1-2=0.5
Pass rate	2	(passing / total) * 2
Test quality	2	Average ratings: poor=0, fair=0.7, good=1.5, excellent=2.0
Code-to-test ratio	1	test LOC / source LOC: >0.8=1.0, >0.5=0.7, >0.3=0.4, <0.3=0.1

Generate metrics-report/data/metrics-latest.json with the full schema:

json

{
  "timestamp": "ISO-8601",
  "repository": { "name": "", "url": "", "branch": "", "commit": "" },
  "stacks": [],
  "files": { "total": 0, "backend": 0, "frontend": 0, "testFiles": 0, "byExtension": {} },
  "tests": {
    "total": 0, "passing": 0, "failing": 0,
    "byType": {
      "unit": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
      "integration": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
      "contract": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
      "e2e": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
      "css": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
      "stress": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
      "chaos": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
      "mutation": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
      "observability": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
      "fuzzy": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] },
      "propertybased": { "total": 0, "passing": 0, "failing": 0, "duration": 0, "files": [] }
    }
  },
  "failures": { "byType": {}, "details": [] },
  "coverage": { "backend": [], "frontend": [] },
  "authors": {},
  "quality": { "byType": {} },
  "score": { "total": 0, "breakdown": { "coverageBreadth": 0, "typeDiversity": 0, "passRate": 0, "testQuality": 0, "codeToTestRatio": 0 } }
}

Test file shape: { "path": "", "type": "", "testCount": 0, "passing": 0, "failing": 0, "author": "", "githubUrl": "", "tests": [{ "name": "", "status": "pass|fail", "duration": 0, "error": "", "line": 0, "githubUrl": "" }] }

Coverage file shape: { "file": "", "layer": "backend|frontend", "githubUrl": "", "coverage": { "unit": { "tool": 85, "llm": true }, "integration": { "tool": null, "llm": false } } } — include an entry for every test type that has tests. tool is a 0-100 integer from the coverage report (or null if no tool report), llm is true if the file is import-mapped by a test.

GitHub URL pattern: https://github.com/{owner}/{repo}/blob/{branch}/{filepath}#L{line}

Print: [Phase 5/7] Done. Score: <total>/10 (coverage: <X>, diversity: <X>, pass rate: <X>, quality: <X>, ratio: <X>)

Phase 6: History and Finalize

Print: [Phase 6/7] Saving history and finalizing...

Run in ONE Bash call:

bash

mkdir -p metrics-report/data/history
cp metrics-report/data/metrics-latest.json "metrics-report/data/history/metrics-$(date -u +%Y-%m-%dT%H-%M-%S).json"
ls metrics-report/data/history/*.json | xargs -I{} basename {} | sort -r | head -50

Write the history-index.json with the file list from above.

If metrics-report/metrics-application/ exists, copy data:

bash

mkdir -p metrics-report/metrics-application/public/data/history
cp metrics-report/data/metrics-latest.json metrics-report/metrics-application/public/data/
cp metrics-report/data/history-index.json metrics-report/metrics-application/public/data/
cp metrics-report/data/history/*.json metrics-report/metrics-application/public/data/history/

Print: [Phase 6/7] Done. History saved.

Print final summary:

=== METRICS REPORT COMPLETE ===
Score: <total>/10
Tests: <passing> passed / <failing> failed / <total> total
Coverage: <backend>% backend, <frontend>% frontend
Stacks: <list>

Then run the website:

bash

cd metrics-report && bash run.sh

Print: Metrics report running at http://localhost:3737

Rules

Do NOT fabricate results. Use real test output and real file analysis.
Do NOT invent files that do not exist.
If a test type has zero tests, report zeros.
Preserve previous history snapshots.
JSON must be valid.
ALWAYS print progress. Never go more than 1 tool call without telling the user what is happening. Use [Phase X/6] and [Test X/Y] format.
MAXIMIZE parallelism for codebase scanning (Agents 1, 2). Use Agent tool ONLY for read-only scanning work.
NEVER delegate test execution to agents. Run tests DIRECTLY with Bash tool calls. Agent overhead adds massive latency for fast tests.
NEVER parallelize test execution. Run test types sequentially: backend first, then frontend.
MINIMIZE file reads. Use Grep/Glob over Read when possible.
BATCH shell commands. One Bash call with chained commands over many calls.

Maintainer

diegopacheco Core maintainer

Source details

Full Name: diegopacheco/ai-playground
Branch: main
Path in repo: pocs/metrics-report-skill
License: The Unlicense
Topics: ai llm python genai ml classification clustering model nlp pytorch regression

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

diegopacheco/ai-playground

json-formatter

Validate, format, and minify JSON files when users request JSON validation, formatting, or ask to validate their JSONs

15 7

Explore

diegopacheco/ai-playground

bruno-generator

Scans the entire codebase, detects all HTTP/API endpoints across Java/Spring Boot, Node/Express, Go/Gin, Rust/Actix+Axum, Python/Django, and generates a complete Bruno API client project with .bru files, sample requests, and environments.

15 7

Explore

diegopacheco/ai-playground

infra-automation-generator

15 7

Explore

diegopacheco/ai-playground

leak-detect

Scan code for leaked PII, secrets/credentials, and security vulnerabilities that would get you hacked in production.

15 7

Explore

diegopacheco/ai-playground

skill-evaluator

This skill should be used when the user asks to "evaluate a skill", "review skill quality", "score my skill", "check skill best practices", "rate my skills", "evaluate all skills", "compare skills", or wants to assess skill quality across criteria like clarity, token efficiency, anti-cheating, quality gates, determinism, scope discipline, error recovery, observability, and idempotency.

15 7

Explore

diegopacheco/ai-playground

threat-analyst

Maps all external-facing endpoints, inputs, and auth boundaries. Scans the whole codebase and produces threat-analysis.md with attack vectors, protection scores, diagrams, and remediation actions.

15 7

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Metrics Report Skill

Progress Reporting (MANDATORY)

Phase 0: Install Metrics Report Site

Phase 1: Setup and Detection (do all at once)

Phase 2: Scan and Run Tests

Background Agent 1: File Count and Test Discovery

Background Agent 2: Git Attribution (batch)

Run Tests Directly (NO agent — use Bash directly)

Phase 3: LLM Coverage Mapping and Tool Coverage Parsing

Step A: Parse tool coverage reports for per-file percentages

Step B: LLM import mapping (for files missing tool coverage)

Building the coverage array

Phase 4: Quality Evaluation (fast)

Phase 5: Compute Score and Generate JSON

Phase 6: History and Finalize

Rules

Recommended Agent Skills

json-formatter

bruno-generator

infra-automation-generator

leak-detect

skill-evaluator

threat-analyst