Agent skill

evals-context

Provides context about the Roo Code evals system structure in this monorepo. Use when tasks mention "evals", "evaluation", "eval runs", "eval exercises", or working with the evals infrastructure. Helps distinguish between the evals execution system (packages/evals, apps/web-evals) and the public website evals display page (apps/web-roo-code/src/app/evals).

View SKILL.md on GitHub Repository

Stars 1,415

Forks 109

Install this agent skill to your Project

npx add-skill https://github.com/foryourhealth111-pixel/Vibe-Skills/tree/main/bundled/skills/evals-context

SKILL.md

Evals Codebase Context

When to Use This Skill

Use this skill when the task involves:

Modifying or debugging the evals execution infrastructure
Adding new eval exercises or languages
Working with the evals web interface (apps/web-evals)
Modifying the public evals display page on roocode.com
Understanding where evals code lives in this monorepo

When NOT to Use This Skill

Do NOT use this skill when:

Working on unrelated parts of the codebase (extension, webview-ui, etc.)
The task is purely about the VS Code extension's core functionality
Working on the main website pages that don't involve evals

Key Disambiguation: Two "Evals" Locations

This monorepo has two distinct evals-related locations that can cause confusion:

Component	Path	Purpose
Evals Execution System	`packages/evals/`	Core eval infrastructure: CLI, DB schema, Docker configs
Evals Management UI	`apps/web-evals/`	Next.js app for creating/monitoring eval runs (localhost:3446)
Website Evals Page	`apps/web-roo-code/src/app/evals/`	Public roocode.com page displaying eval results
External Exercises Repo	Roo-Code-Evals	Actual coding exercises (NOT in this monorepo)

Directory Structure Reference

`packages/evals/` - Core Evals Package

packages/evals/
├── ARCHITECTURE.md          # Detailed architecture documentation
├── ADDING-EVALS.md          # Guide for adding new exercises/languages
├── README.md                # Setup and running instructions
├── docker-compose.yml       # Container orchestration
├── Dockerfile.runner        # Runner container definition
├── Dockerfile.web           # Web app container
├── drizzle.config.ts        # Database ORM config
├── src/
│   ├── index.ts             # Package exports
│   ├── cli/                 # CLI commands for running evals
│   │   ├── runEvals.ts      # Orchestrates complete eval runs
│   │   ├── runTask.ts       # Executes individual tasks in containers
│   │   ├── runUnitTest.ts   # Validates task completion via tests
│   │   └── redis.ts         # Redis pub/sub integration
│   ├── db/
│   │   ├── schema.ts        # Database schema (runs, tasks)
│   │   ├── queries/         # Database query functions
│   │   └── migrations/      # SQL migrations
│   └── exercises/
│       └── index.ts         # Exercise loading utilities
└── scripts/
    └── setup.sh             # Local macOS setup script

`apps/web-evals/` - Evals Management Web App

apps/web-evals/
├── src/
│   ├── app/
│   │   ├── page.tsx         # Home page (runs list)
│   │   ├── runs/
│   │   │   ├── new/         # Create new eval run
│   │   │   └── [id]/        # View specific run status
│   │   └── api/runs/        # SSE streaming endpoint
│   ├── actions/             # Server actions
│   │   ├── runs.ts          # Run CRUD operations
│   │   ├── tasks.ts         # Task queries
│   │   ├── exercises.ts     # Exercise listing
│   │   └── heartbeat.ts     # Controller health checks
│   ├── hooks/               # React hooks (SSE, models, etc.)
│   └── lib/                 # Utilities and schemas

`apps/web-roo-code/src/app/evals/` - Public Website Evals Page

apps/web-roo-code/src/app/evals/
├── page.tsx      # Fetches and displays public eval results
├── evals.tsx     # Main evals display component
├── plot.tsx      # Visualization component
└── types.ts      # EvalRun type (extends packages/evals types)

This page displays eval results on the public roocode.com website. It imports types from @roo-code/evals but does NOT run evals.

Architecture Overview

The evals system is a distributed evaluation platform that runs AI coding tasks in isolated VS Code environments:

┌─────────────────────────────────────────────────────────────┐
│  Web App (apps/web-evals)  ──────────────────────────────── │
│        │                                                    │
│        ▼                                                    │
│  PostgreSQL ◄────► Controller Container                     │
│        │               │                                    │
│        ▼               ▼                                    │
│     Redis ◄───► Runner Containers (1-25 parallel)           │
└─────────────────────────────────────────────────────────────┘

Key components:

Controller: Orchestrates eval runs, spawns runners, manages task queue (p-queue)
Runner: Isolated Docker container with VS Code + Roo Code extension + language runtimes
Redis: Pub/sub for real-time events (NOT task queuing)
PostgreSQL: Stores runs, tasks, metrics

Common Tasks Quick Reference

Adding a New Eval Exercise

Add exercise to Roo-Code-Evals repo (external)
See packages/evals/ADDING-EVALS.md for structure

Modifying Eval CLI Behavior

Edit files in packages/evals/src/cli/:

runEvals.ts - Run orchestration
runTask.ts - Task execution
runUnitTest.ts - Test validation

Modifying the Evals Web Interface

Edit files in apps/web-evals/src/:

app/runs/new/new-run.tsx - New run form
actions/runs.ts - Run server actions

Modifying the Public Evals Display Page

Edit files in apps/web-roo-code/src/app/evals/:

evals.tsx - Display component
plot.tsx - Charts

Database Schema Changes

Edit packages/evals/src/db/schema.ts
Generate migration: cd packages/evals && pnpm drizzle-kit generate
Apply migration: pnpm drizzle-kit migrate

Running Evals Locally

bash

# From repo root
pnpm evals

# Opens web UI at http://localhost:3446

Ports (defaults):

PostgreSQL: 5433
Redis: 6380
Web: 3446

Testing

bash

# packages/evals tests
cd packages/evals && npx vitest run

# apps/web-evals tests
cd apps/web-evals && npx vitest run

Key Types/Exports from `@roo-code/evals`

The package exports are defined in packages/evals/src/index.ts:

Database queries: getRuns, getTasks, getTaskMetrics, etc.
Schema types: Run, Task, TaskMetrics
Used by both apps/web-evals and apps/web-roo-code

Maintainer

foryourhealth111-pixel Core maintainer

Source details

Full Name: foryourhealth111-pixel/Vibe-Skills
Branch: main
Path in repo: bundled/skills/evals-context
License: Apache License 2.0
Topics: claude-code anthropic claude agent-skills automation mcp ai-agents cursor developer-tools agentic-coding skills llm codex claude-skills vibe-coding vibecoding opencode ai-skills ai-workflow windsurf

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

foryourhealth111-pixel/Vibe-Skills

pufferlib

This skill should be used when working with reinforcement learning tasks including high-performance RL training, custom environment development, vectorized parallel simulation, multi-agent systems, or integration with existing RL environments (Gymnasium, PettingZoo, Atari, Procgen, etc.). Use this skill for implementing PPO training, creating PufferEnv environments, optimizing RL performance, or developing policies with CNNs/LSTMs.

1,415 109

Explore

foryourhealth111-pixel/Vibe-Skills

fluidsim

Framework for computational fluid dynamics simulations using Python. Use when running fluid dynamics simulations including Navier-Stokes equations (2D/3D), shallow water equations, stratified flows, or when analyzing turbulence, vortex dynamics, or geophysical flows. Provides pseudospectral methods with FFT, HPC support, and comprehensive output analysis.

1,415 109

Explore

foryourhealth111-pixel/Vibe-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

1,415 109

Explore

foryourhealth111-pixel/Vibe-Skills

build-error-resolver

Compatibility alias for build-specific error resolution. Use this when VCO routes to build-error-resolver but the upstream agent is unavailable in the current runtime.

1,415 109

Explore

foryourhealth111-pixel/Vibe-Skills

geniml

This skill should be used when working with genomic interval data (BED files) for machine learning tasks. Use for training region embeddings (Region2Vec, BEDspace), single-cell ATAC-seq analysis (scEmbed), building consensus peaks (universes), or any ML-based analysis of genomic regions. Applies to BED file collections, scATAC-seq data, chromatin accessibility datasets, and region-based genomic feature learning.

1,415 109

Explore

foryourhealth111-pixel/Vibe-Skills

zinc-database

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

1,415 109

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Evals Codebase Context

When to Use This Skill

When NOT to Use This Skill

Key Disambiguation: Two "Evals" Locations

Directory Structure Reference

packages/evals/ - Core Evals Package

apps/web-evals/ - Evals Management Web App

apps/web-roo-code/src/app/evals/ - Public Website Evals Page

Architecture Overview

Common Tasks Quick Reference

Adding a New Eval Exercise

Modifying Eval CLI Behavior

Modifying the Evals Web Interface

Modifying the Public Evals Display Page

Database Schema Changes

Running Evals Locally

Testing

Key Types/Exports from @roo-code/evals

Recommended Agent Skills

pufferlib

fluidsim

metabolomics-workbench-database

build-error-resolver

geniml

zinc-database

`packages/evals/` - Core Evals Package

`apps/web-evals/` - Evals Management Web App

`apps/web-roo-code/src/app/evals/` - Public Website Evals Page

Key Types/Exports from `@roo-code/evals`