Sponsored by

Find leads on Reddit on auto pilot

Agent skills
add-benchmark

Agent skill

add-benchmark

Add a new SWE benchmark task from a real GitHub bug-fix. Use when the user provides a GitHub issue or PR URL and wants to add it to the bench-swe pipeline.

View SKILL.md on GitHub Repository

Stars 144

Forks 12

Install this agent skill to your Project

npx add-skill https://github.com/ory/lumen/tree/main/.claude/skills/add-benchmark

SKILL.md

Add SWE Benchmark

Add a new benchmark task to the bench-swe pipeline from a real GitHub bug-fix. The human provides the GitHub issue or PR URL; the agent handles extraction, validation, and file creation.

Arguments

url (required): GitHub issue or PR URL (e.g. https://github.com/gorilla/mux/issues/534 or https://github.com/gorilla/mux/pull/585)
language (required): One of: go, python, typescript, javascript, rust, ruby, java, c, cpp, php, csharp

Repository selection criteria

Good benchmark repos are focused libraries with a clear bug — not large applications. Before submitting a URL, prefer repos that are:

Size: < 50 MB and < 800 source files (excludes vendor/node_modules)
Dependencies: < 50 direct dependencies (go.mod, package.json, etc.)
Scope: a library or small service, not a monorepo or full application

The agent will reject repos that exceed these limits.

Steps

Dispatch the task-curator agent with the provided arguments. The agent will:
- Validate inputs (URL, language)
- Check repository size and dependency count (rejects oversized repos)
- Resolve the fix PR (from issue or directly)
- Clone the repo, extract base/fix commits, and generate the gold patch
- Determine the test command from repo conventions
- Write task JSON to bench-swe/tasks/{language}/ and patch to bench-swe/patches/
- Run 5 inline verification checks (patch applies, files match, no leaks, schema completeness, no test files in patch)
- Fix any issues found during verification
Report the result including:
- Task ID, repo, issue URL
- Files and lines changed
- Verification table

Maintainer

ory Core maintainer

Source details

Full Name: ory/lumen
Branch: main
Path in repo: .claude/skills/add-benchmark
License: Other
Topics: claude-code claude mcp mcp-server agentic-coding codex claude-ai gemini context golang plugin gpt-5 claude-pl

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

doctor

Run a health check on the bundled Lumen semantic search setup for the current project, verify backend reachability and index freshness, and summarize remediation steps.

reindex

Refresh or rebuild the bundled Lumen index for the current project, preferring MCP-driven refreshes and using the CLI only for an explicit clean rebuild.

davila7/claude-code-templates

verl-rl-training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

davila7/claude-code-templates

openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

davila7/claude-code-templates

gguf-quantization

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

davila7/claude-code-templates

Claude Code Guide

Master guide for using Claude Code effectively. Includes configuration templates, prompting strategies "Thinking" keywords, debugging techniques, and best practices for interacting with the agent.

Didn't find tool you were looking for?