Agent skill
hyperagent
Run a self-referential self-improving agent loop where a meta-agent iteratively modifies a task-agent's code to optimize for any measurable target. Based on Facebook Research's Hyperagents paper (arXiv:2603.19461). Use when asked to "run hyperagent", "self-improve this", "optimize with self-modification", or "evolve this agent/script".
Install this agent skill to your Project
npx add-skill https://github.com/ckorhonen/claude-skills/tree/main/skills/hyperagent
SKILL.md
Hyperagent
Quick Start — Simple Examples
New to Hyperagent? Try these beginner-friendly tasks before the full setup.
1. Optimize a simple Python script to run faster
Say: "Use hyperagent to optimize this script for speed" and paste something like:
# slow_sort.py
def sort_numbers(nums):
result = []
while nums:
smallest = min(nums)
result.append(smallest)
nums.remove(smallest)
return result
Hyperagent will benchmark it, propose a faster implementation, and validate the improvement.
2. Improve a prompt to get better answers
Say: "Run hyperagent on this prompt and improve accuracy" with a prompt like:
Summarize this article in one sentence.
The meta-agent iterates on the prompt, measures quality, and keeps improvements that score higher.
3. Make a sorting function more efficient
Say: "Evolve this function with hyperagent" and paste any function. Hyperagent creates a benchmark, runs generations of improvements, and shows you the performance gain per generation.
4. Self-improve any script
Say: "Self-improve this agent/script" and point to any Python file. Hyperagent wraps it in an evaluation loop, proposes modifications, and tracks what works.
The simplest possible setup: create
task.shthat printsMETRIC score=0.5, then runpython3 scripts/init_session.py. From there the loop is fully automated.
Self-referential self-improvement: a meta-agent that modifies a task-agent (and itself) to optimize any measurable objective.
Inspired by Facebook Research's Hyperagents paper (arXiv:2603.19461), which demonstrated that agents combining a task-solver and a self-modifying meta-level into a single editable program can achieve open-ended, compounding improvements that transfer across domains.
How It Works
A hyperagent is a system with two components in a single editable codebase:
- Task Agent — solves the target task (benchmark, code generation, data processing, etc.)
- Meta Agent — analyzes task performance history and proposes modifications to the task agent's code (and optionally its own code)
The key insight from the paper: when the meta-level modification procedure is itself editable, the system can improve not just task performance but also the mechanism that generates future improvements — enabling compounding, transferable gains.
Core Principles
-
Self-referential modification
The meta-agent can modify the task-agent's code AND its own strategy. Both live in the same editable workspace. This enables metacognitive self-improvement: improving how you improve.
-
Population-based exploration (archive)
Don't just keep the best variant — maintain an archive of all successful variants as stepping stones. Parent selection favors high performers with unexplored potential.
-
Empirical evaluation gates everything
No change is accepted without measurement. Every candidate is evaluated against the task benchmark with repeated trials.
-
Persistent memory and performance tracking
The system maintains a structured history of all experiments, hypotheses, and outcomes. Later generations build on earlier insights — no rediscovering dead ends.
-
Transfer across domains
Meta-level improvements (performance tracking, evaluation strategies, hypothesis generation patterns) are domain-agnostic and can be transferred to new tasks.
Available Scripts
scripts/common.py— shared utilities (archive management, metrics, reporting)scripts/init_session.py— initialize a hyperagent session, scaffold the workspacescripts/run_task.py— evaluate a task-agent variant and record metricsscripts/log_variant.py— log evaluated record, decide disposition, update archive and reportsscripts/render_report.py— generate HTML report of the full evolutionary historyscripts/select_parent.py— select a parent from the archive for the next generation
Note: There is no
generate_variant.pyscript — the meta-agent role (hypothesis generation and code modification) is performed by the LLM agent itself, not by a script.
All scripts are non-interactive, expose --help, emit structured JSON on stdout, and keep diagnostics on stderr.
Default Workflow
-
Initialize the session after defining the optimization target:
bashpython3 scripts/init_session.py \ --goal "Improve prompt accuracy on math benchmark" \ --metric-name accuracy \ --unit pct \ --direction higher \ --task-command ./task.sh \ --checks-command ./checks.sh \ --scope src/agent.py \ --max-generations 50 -
Evaluate the baseline (generation 0):
bashpython3 scripts/run_task.py \ --id gen-000 \ --hypothesis "Control: unmodified task agent" \ --change-summary "No modifications" \ --baseline \ --output .hyperagent/gen-000.json python3 scripts/log_variant.py --input .hyperagent/gen-000.json -
Selection → Modification → Evaluation loop:
bash# Select a parent from the archive python3 scripts/select_parent.py --output .hyperagent/parent.json # Generate a variant (meta-agent proposes modifications) # This is where YOU (the LLM agent) act as the meta-agent: # - Read the parent's code and performance history # - Hypothesize an improvement # - Apply code modifications # - Record what you changed and why # Evaluate the variant python3 scripts/run_task.py \ --id gen-001 \ --hypothesis "Add chain-of-thought prompting to improve reasoning" \ --change-summary "Wrap task prompt in step-by-step reasoning template" \ --parent gen-000 \ --output .hyperagent/gen-001.json python3 scripts/log_variant.py --input .hyperagent/gen-001.json -
Render reports at any time:
bashpython3 scripts/render_report.py
Up-Front Q&A
Before starting, gather or confirm:
- Objective — what are we optimizing?
- Primary metric — exact name, unit, direction (lower/higher)
- Task command — the script that runs the task agent and emits
METRIC name=valuelines - Correctness gates — tests or checks that must pass for a variant to be kept
- Scope — which files can the meta-agent modify?
- Meta-scope — can the meta-agent modify its own strategy? (default: yes)
- Generation budget — max generations before stopping
- Minimum improvement threshold — default 1%
Workspace Setup
-
Prefer a dedicated worktree on a fresh branch:
bashgit worktree add ../hyperagent-<goal>-<date> -b hyperagent/<goal>-<date> -
Create:
hyperagent.md— checked in, durable session brief with full evolutionary historytask.sh— checked in, benchmark runner (emitsMETRIC name=value)checks.sh— checked in, correctness gates.hyperagent/— local artifact directory, NOT checked in
-
Ensure artifacts stay untracked:
bashrg -qxF '.hyperagent/' .git/info/exclude || printf '\n.hyperagent/\n' >> .git/info/exclude
The Meta-Agent Role
You (the LLM) are the meta-agent. Your job each generation is:
- Select parent — use
scripts/select_parent.pyor choose based on the archive - Analyze — read the parent's code, performance history, and past experiment outcomes
- Hypothesize — propose a specific, testable modification with a causal theory for why it should help
- Modify — apply code changes to the task agent (and optionally to your own strategy notes in
hyperagent.md) - Evaluate — run
scripts/run_task.pyto measure the variant - Log — use
scripts/log_variant.pyto record the result and update the archive - Reflect — update
hyperagent.mdwith what you learned
Meta-Level Self-Modification
The meta-agent can improve its own process by updating:
- Strategy notes in
hyperagent.md(hypothesis generation patterns, evaluation heuristics) - Memory entries in
.hyperagent/memory.jsonl(qualitative insights, correction plans) - The task evaluation protocol (adding secondary metrics, changing trial counts)
These meta-improvements compound across generations and transfer to new tasks.
Required Files
hyperagent.md
The durable contract and evolutionary history. A fresh agent can resume from this.
# Hyperagent: <goal>
## Objective
<What is being optimized and why.>
## Configuration
- Primary metric:
- Unit:
- Direction:
- Minimum improvement: X%
- Task command:
- Correctness gates:
- Generation budget:
## Scope
- Task agent files:
- Meta-agent can self-modify: yes/no
## Archive
`.hyperagent/archive.jsonl`
## Lineage
<Tree showing parent→child relationships and which variants were kept>
## Meta-Strategy
<Current approach to hypothesis generation — updated as the meta-agent learns>
## What We've Learned
<Key wins, dead ends, transferable insights>
## Performance Tracking
<Best variant, improvement trajectory, current plateau status>
task.sh
Bash script that runs the task agent and emits METRIC name=value lines:
#!/bin/bash
set -euo pipefail
# Run the task agent
python3 src/agent.py --input data/test.json 2>/dev/null
# The agent script should emit: METRIC accuracy=0.85
Archive Structure
The archive (.hyperagent/archive.jsonl) stores every variant ever evaluated:
{
"id": "gen-007",
"generation": 7,
"parent_id": "gen-003",
"timestamp": "2026-03-27T20:00:00Z",
"hypothesis": "Add few-shot examples to improve pattern recognition",
"change_summary": "Inserted 3 domain-specific examples into the task prompt",
"files_touched": ["src/agent.py"],
"metric_name": "accuracy",
"direction": "higher",
"warmup_trials": [0.82, 0.83],
"measured_trials": [0.85, 0.86, 0.84, 0.85, 0.87],
"summary": {"median": 0.85, "mean": 0.854, "min": 0.84, "max": 0.87},
"checks": "passed",
"disposition": "keep",
"children_count": 0,
"meta_modifications": ["Updated strategy notes with few-shot pattern"],
"reason": "Improved by 3.2% over parent gen-003 (0.824). Checks passed."
}
Parent Selection
Selection probability for a parent is proportional to:
- Performance score (higher is better for archive diversity)
- Inverse of children count (favor unexplored high-performers)
This balances exploitation (good variants) with exploration (understudied variants).
python3 scripts/select_parent.py
# Output: {"selected_parent": "gen-003", "score": 0.824, "children": 1, "reason": "High performer with few children"}
Decision Rules
keep— variant beats current best by ≥ threshold, checks passdiscard— variant is worse, equal, or improvement below thresholdchecks_failed— metric improved but correctness gates failedcrash— variant could not be evaluated
Plateau Detection
Track improvement velocity. Stop or pivot when:
- 3+ consecutive generations with no improvement
- Hypothesis diversity drops (recycling ideas)
- Improvement velocity < 0.1% per generation over last 5
Loop Behavior
Run autonomously until:
- Generation budget exhausted
- Plateau detected (3 consecutive non-improvements)
- All promising hypotheses explored
- User interrupts
During the loop:
- One hypothesis per generation
- Record dead ends explicitly
- Keep the worktree clean between variants (revert discarded changes)
- Update
hyperagent.mdafter every generation
Common Pitfalls
1. Meta-Agent Overfitting Its Own Strategy
Symptom: Meta-strategy becomes over-specialized to early successes Fix: Periodically review and broaden the strategy; try categorically different approaches
2. Archive Bloat
Symptom: Archive grows large, selection becomes slow Fix: Archive old generations after 50 variants; maintain a compact summary
3. Self-Modification Destabilizing the Loop
Symptom: Meta-agent modifies evaluation or logging in ways that break the loop Fix: Keep outer-loop scripts (init, run, log, select) immutable. Only modify task code and strategy notes.
4. Hypothesis Recycling
Symptom: Later generations retry earlier failed ideas
Fix: Always read .hyperagent/memory.jsonl before proposing. Explicitly check against dead ends.
Transfer Protocol
To transfer meta-improvements to a new domain:
- Extract meta-strategy from
hyperagent.md"What We've Learned" section - Copy
.hyperagent/memory.jsonlas starting knowledge - Initialize new session with transferred strategy as initial context
- The meta-agent starts with accumulated wisdom instead of from scratch
Report Generation
python3 scripts/render_report.py
Generates .hyperagent/report.html with:
- Lineage tree visualization
- Performance over generations
- Best-so-far trend
- Disposition breakdown
- Per-variant trial distributions
- Meta-strategy evolution timeline
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
subway-info
Get real-time NYC transit information — subway, bus, ferry, and commuter rail — via the subway-info CLI or REST API at subwayinfo.nyc. Use when asked about NYC subway status, train times, bus routes, ferry schedules, transit delays, MTA service alerts, or "what's the next train to X".
codex-advisor
Get a second opinion from OpenAI Codex CLI for plan reviews, code reviews, architecture decisions, and hard problems. Use when you need external validation, want to compare approaches, or are stuck on a difficult problem.
brainstorming
Explore user intent, requirements and design before implementation through structured dialogue and design proposals. Use when asked to: create features, build components, add functionality, modify behavior, plan projects, or when user says 'help me design X', 'what should we build', 'let's brainstorm', or starts describing a new feature without a design.
direct-mail-strategist
Expert direct mail marketing strategist for writing compelling copy, designing high-converting mail pieces, and developing measurement strategies. Use when planning direct mail campaigns, writing mailer copy, designing postcards/letters, or measuring campaign effectiveness with incremental lift analysis.
gemini-image-generator
Generate images using Google's Gemini API. Use when creating images from text prompts, editing existing images, or combining reference images for AI-generated visual content.
ui-design
Opinionated constraints for building better interfaces with agents. Use when building UI components, implementing animations, designing layouts, reviewing frontend accessibility, or working with Tailwind CSS, motion/react, or accessible primitives like Radix/Base UI.
Didn't find tool you were looking for?