Agent skill

edd

Eval-Driven Development (EDD) Framework v2.87.0 - Define-before-implement pattern with structured evals. Provides workflow: Define specifications → Implement features → Verify against evals. Components: TEMPLATE.md for eval definitions, edd.sh CLI script, /edd skill invocation. Check types: CC- (Capability), BC- (Behavior), NFC- (Non-Functional). Integrates with orchestrator workflow for quality-first development. Keywords: evals, define, implement, verify, capability checks, behavior checks, non-functional checks, template, quality assurance, test-driven, specification. Use when: defining new features with structured evals, implementing with verification requirements, creating quality specifications, TDD-style workflow with evals.

Stars 115
Forks 18

Install this agent skill to your Project

npx add-skill https://github.com/alfredolopez80/multi-agent-ralph-loop/tree/main/.claude/skills/edd

SKILL.md

EDD (Eval-Driven Development) Framework v2.64

Eval-Driven Development is a quality-first development pattern that enforces define-before-implement workflow with structured evaluations.

v2.88 Key Changes (MODEL-AGNOSTIC)

  • Model-agnostic: Uses model configured in ~/.claude/settings.json or CLI/env vars
  • No flags required: Works with the configured default model
  • Flexible: Works with GLM-5, Claude, Minimax, or any configured model
  • Settings-driven: Model selection via ANTHROPIC_DEFAULT_*_MODEL env vars

What is EDD?

EDD provides a systematic approach to software development with three phases:

  1. DEFINE - Create structured eval specifications using TEMPLATE.md
  2. IMPLEMENT - Build features according to eval definitions
  3. VERIFY - Validate implementation against eval criteria

Check Types

Prefix Type Purpose
CC- Capability Checks Feature capabilities and functionality
BC- Behavior Checks Expected behaviors and responses
NFC- Non-Functional Checks Performance, security, maintainability

Usage

bash
# Invoke EDD workflow
/edd "Define memory-search feature"

# CLI script (if available)
ralph edd define memory-search
ralph edd check memory-search

Components

  • TEMPLATE.md: Template for creating eval definitions
  • edd.sh: CLI script for eval management
  • /edd skill: Skill invocation from Claude Code
  • ~/.claude/evals/: Directory for eval definitions

Template Structure

Each eval definition includes:

  1. Capability Checks (CC-) - What the feature can do
  2. Behavior Checks (BC-) - How the feature behaves
  3. Non-Functional Checks (NFC-) - Performance, security, etc.
  4. Implementation Notes - Technical guidance
  5. Verification Evidence - Test results

Example: memory-search.md

markdown
# Memory Search Eval

**Status**: DRAFT
**Created**: 2026-01-30

## Capability Checks
- [ ] CC-1: Search across semantic memory
- [ ] CC-2: Support filtering by type

## Behavior Checks
- [ ] BC-1: Returns ranked results
- [ ] BC-2: Handles empty queries gracefully

## Non-Functional Checks
- [ ] NFC-1: Search completes in <2s
- [ ] NFC-2: Memory usage <100MB

## Implementation Notes
- Use parallel search for performance
- Cache frequent queries

## Verification Evidence
- Test results attached

Integration with Orchestrator

EDD integrates with the orchestrator workflow to ensure quality-first development:

  1. Clarify phase - Define evals
  2. Plan phase - Review eval requirements
  3. Implement phase - Build to eval specs
  4. Validate phase - Verify against evals

Swarm Mode Integration (v2.81.1)

EDD framework now supports swarm mode for parallel evaluation across multiple check types.

Auto-Spawn Configuration

When invoked via /edd, the framework automatically spawns a specialized evaluation team:

yaml
Task:
  subagent_type: "general-purpose"
  model: "sonnet"
  team_name: "edd-evaluation-team"
  name: "edd-coordinator"
  mode: "delegate"
  run_in_background: true
  prompt: |
    Execute Eval-Driven Development workflow for: $ARGUMENTS

    EDD Pattern:
    1. DEFINE - Create structured eval specifications
    2. DISTRIBUTE - Assign check types to specialists
    3. VERIFY - Validate against eval criteria
    4. CONSOLIDATE - Merge findings from all evaluators

Team Composition

Role Purpose Specialization
Coordinator EDD workflow orchestration Manages eval lifecycle, consolidates findings
Teammate 1 Capability Checks specialist CC- prefix: feature capabilities and functionality
Teammate 2 Behavior Checks specialist BC- prefix: expected behaviors and responses
Teammate 3 Non-Functional Checks specialist NFC- prefix: performance, security, maintainability

Swarm Mode Workflow

User invokes: /edd "Define memory-search feature"

1. Team "edd-evaluation-team" created
2. Coordinator (edd-coordinator) receives task
3. 3 Teammates spawned with check-type specializations
4. Eval definition distributed:
   - Teammate 1 → Capability Checks (CC-)
   - Teammate 2 → Behavior Checks (BC-)
   - Teammate 3 → Non-Functional Checks (NFC-)
5. Teammates work in parallel (background execution)
6. Coordinator monitors progress and gathers results
7. Findings consolidated into single eval specification
8. Final eval document returned

Parallel Evaluation Pattern

Each teammate focuses on their check type:

yaml
# Teammate 1: Capability Checks
CC-1: Feature can perform X
CC-2: Feature supports Y configuration
CC-3: Feature integrates with Z system

# Teammate 2: Behavior Checks
BC-1: Feature handles error case A gracefully
BC-2: Feature returns expected response for B
BC-3: Feature maintains state across C

# Teammate 3: Non-Functional Checks
NFC-1: Response time < 100ms
NFC-2: Memory usage < 50MB
NFC-3: Security vulnerability scan passes

Communication Between Teammates

Teammates use the built-in mailbox system:

yaml
# Teammate sends finding to coordinator
SendMessage:
  type: "message"
  recipient: "edd-coordinator"
  content: "CC-3 defined: Feature integrates with auth system via OAuth2"

Task List Coordination

All teammates share a unified task list:

bash
# Location: ~/.claude/tasks/edd-evaluation-team/tasks.json

# Example tasks:
[
  {"id": "1", "subject": "Define Capability Checks", "owner": "teammate-1"},
  {"id": "2", "subject": "Define Behavior Checks", "owner": "teammate-2"},
  {"id": "3", "subject": "Define Non-Functional Checks", "owner": "teammate-3"},
  {"id": "4", "subject": "Consolidate eval specification", "owner": "edd-coordinator"}
]

Manual Override

To disable swarm mode:

bash
/edd "Define feature X" --no-swarm

Output Location

bash
# Evals saved to ~/.claude/evals/
ls ~/.claude/evals/

# View last eval
cat ~/.claude/evals/latest.md

Testing

Test suite: tests/test_v264_edd_framework.bats (33 tests)

Run tests:

bash
bats tests/test_v264_edd_framework.bats

Swarm Mode Tests

Additional tests for swarm mode integration:

bash
# Test swarm team creation
tests/edd/test-swarm-team-creation.sh

# Test parallel evaluation
tests/edd/test-parallel-evaluation.sh

Status

Current: Framework defined with swarm mode integration (v2.81.1) Note: TEMPLATE.md and evals directory structure ready for use


Version: v2.64 | Status: DRAFT | Tests: 33 passing

Action Reporting (v2.93.0)

Esta skill genera reportes automáticos completos para trazabilidad:

Reporte Automático

Cuando esta skill completa, se genera automáticamente:

  1. En la conversación de Claude: Resultados visibles
  2. En el repositorio: docs/actions/edd/{timestamp}.md
  3. Metadatos JSON: .claude/metadata/actions/edd/{timestamp}.json

Contenido del Reporte

Cada reporte incluye:

  • Summary: Descripción de la tarea ejecutada
  • Execution Details: Duración, iteraciones, archivos modificados
  • Results: Errores encontrados, recomendaciones
  • Next Steps: Próximas acciones sugeridas

Ver Reportes Anteriores

bash
# Listar todos los reportes de esta skill
ls -lt docs/actions/edd/

# Ver el reporte más reciente
cat $(ls -t docs/actions/edd/*.md | head -1)

# Buscar reportes fallidos
grep -l "Status: FAILED" docs/actions/edd/*.md

Generación Manual (Opcional)

bash
source .claude/lib/action-report-lib.sh
start_action_report "edd" "Task description"
# ... ejecución ...
complete_action_report "success" "Summary" "Recommendations"

Referencias del Sistema

  • Action Reports System - Documentación completa
  • action-report-lib.sh - Librería helper
  • action-report-generator.sh - Generador

Expand your agent's capabilities with these related and highly-rated skills.

alfredolopez80/multi-agent-ralph-loop

browser-test

Browser testing using Chrome DevTools MCP and Playwright for visual verification. Start dev server, navigate, screenshot, Lighthouse audit, console errors, network check. Use when: (1) verifying frontend changes, (2) accessibility auditing, (3) performance testing, (4) visual regression. Triggers: /browser-test, 'test in browser', 'visual test', 'lighthouse audit'.

115 18
Explore
alfredolopez80/multi-agent-ralph-loop

crafting-effective-readmes

Use when writing or improving README files. Not all READMEs are the same — provides templates and guidance matched to your audience and project type.

115 18
Explore
alfredolopez80/multi-agent-ralph-loop

stop-slop

A skill for removing AI-generated writing patterns ('slop') from prose. Eliminates telltale signs of AI writing like filler phrases, excessive hedging, overly formal language, and mechanical sentence structures. Use when: writing content that should sound human and natural, editing AI-generated drafts, cleaning up prose for publication, or any content that needs to sound authentic rather than AI-generated. Triggers: 'stop-slop', 'remove AI tells', 'clean up prose', 'make it sound human', 'edit AI writing'.

115 18
Explore
alfredolopez80/multi-agent-ralph-loop

bugs

Bug hunting with Codex CLI Use when: (1) /bugs is invoked, (2) task relates to bugs functionality.

115 18
Explore
alfredolopez80/multi-agent-ralph-loop

adversarial

Multi-Agent Adversarial Analysis System for code security

115 18
Explore
alfredolopez80/multi-agent-ralph-loop

senior-software-engineer

Global skill enforcing senior software engineering best practices

115 18
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results