Agent skill

bare-eval

Run isolated eval and grading calls using CC 2.1.81 --bare mode. Constructs claude -p --bare invocations for skill evaluation, trigger testing, and LLM grading without plugin/hook interference. Use when running eval pipelines, grading skill outputs, benchmarking prompt quality, or testing trigger accuracy in isolation.

Stars 143
Forks 15

Install this agent skill to your Project

npx add-skill https://github.com/yonatangross/orchestkit/tree/main/plugins/ork/skills/bare-eval

SKILL.md

Bare Eval — Isolated Evaluation Calls

Run claude -p --bare for fast, clean eval/grading without plugin overhead.

CC 2.1.81 required. The --bare flag skips hooks, LSP, plugin sync, and skill directory walks.

When to Use

  • Grading skill outputs against assertions
  • Trigger classification (which skill matches a prompt)
  • Description optimization iterations
  • Any scripted -p call that doesn't need plugins

When NOT to Use

  • Testing skill routing (needs --plugin-dir)
  • Testing agent orchestration (needs full plugin context)
  • Interactive sessions

Prerequisites

bash
# --bare requires ANTHROPIC_API_KEY (OAuth/keychain disabled)
export ANTHROPIC_API_KEY="sk-ant-..."

# Verify CC version
claude --version  # Must be >= 2.1.81

Quick Reference

Call Type Command Pattern
Grading claude -p "$prompt" --bare --max-turns 1 --output-format text
Trigger claude -p "$prompt" --bare --json-schema "$schema" --output-format json
Optimize echo "$prompt" | claude -p --bare --max-turns 1 --output-format text
Force-skill claude -p "$prompt" --bare --print --append-system-prompt "$content"

Invocation Patterns

Load detailed patterns and examples:

Read("${CLAUDE_SKILL_DIR}/references/invocation-patterns.md")

Grading Schemas

JSON schemas for structured eval output:

Read("${CLAUDE_SKILL_DIR}/references/grading-schemas.md")

Pipeline Integration

OrchestKit's eval scripts (npm run eval:skill) auto-detect bare mode:

bash
# eval-common.sh detects ANTHROPIC_API_KEY → sets BARE_MODE=true
# Scripts add --bare to all non-plugin calls automatically

Bare calls: Trigger classification, force-skill, baseline, all grading. Never bare: run_with_skill (needs plugin context for routing tests).

Performance

Scenario Without --bare With --bare Savings
Single grading call ~3-5s startup ~0.5-1s 2-4x
Trigger (per prompt) ~3-5s ~0.5-1s 2-4x
Full eval (50 calls) ~150-250s overhead ~25-50s 3-5x

Rules

Read("${CLAUDE_SKILL_DIR}/rules/_sections.md")

Troubleshooting

Read("${CLAUDE_SKILL_DIR}/references/troubleshooting.md")

Related

  • eval:skill npm script — unified skill evaluation runner
  • eval:trigger — trigger accuracy testing
  • eval:quality — A/B quality comparison
  • optimize-description.sh — iterative description improvement
  • Version compatibility: doctor/references/version-compatibility.md

Expand your agent's capabilities with these related and highly-rated skills.

yonatangross/orchestkit

expect

Diff-aware AI browser testing — analyzes git changes, generates targeted test plans, and executes them via agent-browser. Reads git diff to determine what changed, maps changes to affected pages via route map, generates a test plan scoped to the diff, and runs it with pass/fail reporting. Use when testing UI changes, verifying PRs before merge, running regression checks on changed components, or validating that recent code changes don't break the user-facing experience.

143 15
Explore
yonatangross/orchestkit

github-operations

GitHub CLI operations for issues, PRs, milestones, and Projects v2. Covers gh commands, REST API patterns, and automation scripts. Use when managing GitHub issues, PRs, milestones, or Projects with gh.

143 15
Explore
yonatangross/orchestkit

chain-patterns

Chain patterns for CC 2.1.71 pipelines — MCP detection, handoff files, checkpoint-resume, worktree agents, CronCreate monitoring. Use when building multi-phase pipeline skills. Loaded via skills: field by pipeline skills (fix-issue, implement, brainstorm, verify). Not user-invocable.

143 15
Explore
yonatangross/orchestkit

storybook-mcp-integration

Storybook MCP server integration for component-aware AI development. Covers 6 tools across 3 toolsets (dev, docs, testing): component discovery via list-all-documentation/get-documentation, story previews via preview-stories, and automated testing via run-story-tests. Use when generating components that should reuse existing Storybook components, running component tests via MCP, or previewing stories in chat.

143 15
Explore
yonatangross/orchestkit

component-search

Search 21st.dev component registry for production-ready React components. Finds components by natural language description, filters by framework and style system, returns ranked results with install instructions. Use when looking for UI components, finding alternatives to existing components, or sourcing design system building blocks.

143 15
Explore
yonatangross/orchestkit

ai-ui-generation

AI-assisted UI generation patterns for json-render, v0, Bolt, and Cursor workflows. Covers prompt engineering for component generation, review checklists for AI-generated code, design token injection, refactoring for design system conformance, and CI gates for quality assurance. Use when generating UI components with AI tools, rendering multi-surface MCP visual output, reviewing AI-generated code, or integrating AI output into design systems.

143 15
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results