Agent skills
skillgrade-graders

Agent skill

skillgrade-graders

Authors deterministic and LLM rubric graders for skillgrade evaluations. Use when creating scoring scripts, writing evaluation rubrics, or combining multiple graders with weighted scoring. Don't use for setting up eval pipelines, configuring eval.yaml defaults, or general test writing.

View SKILL.md on GitHub Repository

Stars 366

Forks 29

Install this agent skill to your Project

npx add-skill https://github.com/mgechev/skillgrade/tree/main/skills/skillgrade-graders

SKILL.md

Skillgrade Grader Authoring

Procedures

Step 1: Identify the Grading Strategy

Determine whether the task requires objective verification (deterministic) or qualitative assessment (LLM rubric).
For most tasks, combine both: deterministic graders verify outcomes (weight 0.7), LLM rubrics assess approach quality (weight 0.3).

Step 2: Write a Deterministic Grader

Create a script in the skill's graders/ directory (bash or TypeScript).

The script must output a JSON object to stdout with the following structure:

json

{"score": 0.67, "details": "2/3 checks passed", "checks": [{"name": "check-name", "passed": true, "message": "Description"}]}

score (0.0–1.0) and details are required. checks is optional but recommended.
Read references/grader-output-schema.md for the full output specification.
Use awk for arithmetic in bash scripts — bc is not available in node:20-slim.

Reference the grader in eval.yaml:

yaml

- type: deterministic
  run: bash graders/check.sh
  weight: 0.7

Step 3: Write an LLM Rubric Grader

Draft a rubric with explicit scoring criteria and point allocations.

Structure the rubric into weighted sections that sum to 1.0:

Workflow Compliance (0-0.5):
- Did the agent follow the mandatory workflow steps?
Efficiency (0-0.5):
- Completed in ≤5 commands without trial-and-error?

Reference the rubric in eval.yaml:

yaml

- type: llm_rubric
  rubric: |
    [rubric text or file path]
  weight: 0.3
  model: gemini-2.0-flash  # optional, auto-detected from API key

For long rubrics, store in a separate file and reference by path: rubric: rubrics/quality.md.

Step 4: Combine Multiple Graders

Assign weights to each grader based on importance. Weights are normalized automatically.
Final reward is calculated as: Σ (grader_score × weight) / Σ weight.

Example configuration:

yaml

graders:
  - type: deterministic
    run: bash graders/check.sh
    weight: 0.7
  - type: llm_rubric
    rubric: rubrics/quality.md
    weight: 0.3

Step 5: Validate Graders

Create a reference solution script that produces the expected output.
Run skillgrade --validate to verify graders score the reference solution correctly.
Test only deterministic graders: skillgrade --grader=deterministic (skips LLM calls, faster iteration).
Test only LLM rubric graders: skillgrade --grader=llm_rubric.
Run a specific eval with a specific grader type: skillgrade --eval=my-eval --grader=deterministic.
If a grader returns unexpected scores, inspect the script output and adjust scoring logic.

Error Handling

If a deterministic grader outputs non-JSON, ensure all echo/console.log statements except the final JSON result are redirected to stderr.
If an LLM rubric grader returns 0.00 with "No API key," set GEMINI_API_KEY or ANTHROPIC_API_KEY in the environment.
If scores are inconsistent across trials, reduce rubric ambiguity by adding concrete examples of passing and failing behavior.

Maintainer

mgechev Core maintainer

Source details

Full Name: mgechev/skillgrade
Branch: main
Path in repo: skills/skillgrade-graders
License: MIT License
Topics: agent claude-code gemini-cli skill codex eval

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

mgechev/skillgrade

superlint

366 29

Explore

mgechev/skillgrade

angular-modern-apis

Guidelines for using modern Angular APIs (signals, inject, control flow)

366 29

Explore

mgechev/skillgrade

skillgrade-setup

Sets up and runs skillgrade evaluation pipelines for Agent Skills. Use when initializing eval configurations, running trials, reviewing results, or integrating with CI. Don't use for writing grader scripts, general test authoring, or non-agentic documentation.

366 29

Explore

mgechev/skills-best-practices

skill-creator

Authors and structures professional-grade agent skills following the agentskills.io spec. Use when creating new skill directories, drafting procedural instructions, or optimizing metadata for discoverability. Don't use for general documentation, non-agentic library code, or README files.

1,785 126

Explore

davila7/claude-code-templates

verl-rl-training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

23,776 2,298

Explore

davila7/claude-code-templates

openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

23,776 2,298

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Skillgrade Grader Authoring

Procedures

Error Handling

Recommended Agent Skills

superlint

angular-modern-apis

skillgrade-setup

skill-creator

verl-rl-training

openrlhf-training