Agent skill
method-development
Design and iterate on new research methods with structured checkpoints, baselines, and validation.
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/method-development
SKILL.md
STANDARD OPERATING PROCEDURE
Purpose
- Develop, refine, and validate novel methods anchored to baselines and constraints.
- Apply constraint hygiene and explicit ceilings to prevent premature claims.
- Keep structure-first artifacts current for handoff and reproducibility.
Trigger Conditions
- Positive: creating or adapting algorithms/pipelines; designing ablations; exploring new research ideas.
- Negative: pure replication (use
baseline-replication) or publication packaging (research-publication).
Guardrails
- HARD / SOFT / INFERRED constraint buckets (compute, data, metrics, ethics) with sources.
- Two-pass refinement on designs: structure vs. baselines, then epistemic/risks.
- Require baseline parity before claiming improvements; document variance sources.
- Confidence ceilings enforced per claim.
Inputs
- Problem statement and success metrics.
- Baselines to beat and constraints (data, compute, deadlines).
- Risk tolerances and evaluation protocols.
Workflow
- Problem Framing: Capture objectives, constraints, and baselines; confirm INFERRED assumptions.
- Design Options: Propose candidates with expected impact; map to constraints.
- Experiment Plan: Define ablations, datasets, metrics, and stopping rules.
- Run & Observe: Execute experiments, log configs/seeds; compare to baselines.
- Validate & Iterate: Analyze results, run adversarial checks, and refine or stop.
- Package: Summarize findings, risks, and next steps; store artifacts and update references/examples.
Validation & Quality Gates
- Baseline beat or variance explained; claims tied to evidence with ceilings.
- Ablations cover key hypotheses; failures documented.
- Reproducibility assets stored (configs, logs, seeds).
Response Template
**Objective & Constraints**
- HARD / SOFT / INFERRED.
**Design Candidates**
- Option → rationale → expected impact.
**Experiment Status**
- Runs, metrics vs. baseline, issues.
**Next Steps**
- Iterate, stop, or expand.
Confidence: 0.80 (ceiling: research 0.85) - based on current evidence and validation checks.
Confidence: 0.80 (ceiling: research 0.85) - reflects validated comparisons to baselines and logged experiments.
Didn't find tool you were looking for?