Agent skill
skillgrade-graders
Authors deterministic and LLM rubric graders for skillgrade evaluations. Use when creating scoring scripts, writing evaluation rubrics, or combining multiple graders with weighted scoring. Don't use for setting up eval pipelines, configuring eval.yaml defaults, or general test writing.
Install this agent skill to your Project
npx add-skill https://github.com/mgechev/skillgrade/tree/main/skills/skillgrade-graders
SKILL.md
Skillgrade Grader Authoring
Procedures
Step 1: Identify the Grading Strategy
- Determine whether the task requires objective verification (deterministic) or qualitative assessment (LLM rubric).
- For most tasks, combine both: deterministic graders verify outcomes (weight 0.7), LLM rubrics assess approach quality (weight 0.3).
Step 2: Write a Deterministic Grader
- Create a script in the skill's
graders/directory (bash or TypeScript). - The script must output a JSON object to stdout with the following structure:
json
{"score": 0.67, "details": "2/3 checks passed", "checks": [{"name": "check-name", "passed": true, "message": "Description"}]} score(0.0–1.0) anddetailsare required.checksis optional but recommended.- Read
references/grader-output-schema.mdfor the full output specification. - Use
awkfor arithmetic in bash scripts —bcis not available innode:20-slim. - Reference the grader in eval.yaml:
yaml
- type: deterministic run: bash graders/check.sh weight: 0.7
Step 3: Write an LLM Rubric Grader
- Draft a rubric with explicit scoring criteria and point allocations.
- Structure the rubric into weighted sections that sum to 1.0:
Workflow Compliance (0-0.5): - Did the agent follow the mandatory workflow steps? Efficiency (0-0.5): - Completed in ≤5 commands without trial-and-error? - Reference the rubric in eval.yaml:
yaml
- type: llm_rubric rubric: | [rubric text or file path] weight: 0.3 model: gemini-2.0-flash # optional, auto-detected from API key - For long rubrics, store in a separate file and reference by path:
rubric: rubrics/quality.md.
Step 4: Combine Multiple Graders
- Assign weights to each grader based on importance. Weights are normalized automatically.
- Final reward is calculated as:
Σ (grader_score × weight) / Σ weight. - Example configuration:
yaml
graders: - type: deterministic run: bash graders/check.sh weight: 0.7 - type: llm_rubric rubric: rubrics/quality.md weight: 0.3
Step 5: Validate Graders
- Create a reference solution script that produces the expected output.
- Run
skillgrade --validateto verify graders score the reference solution correctly. - Test only deterministic graders:
skillgrade --grader=deterministic(skips LLM calls, faster iteration). - Test only LLM rubric graders:
skillgrade --grader=llm_rubric. - Run a specific eval with a specific grader type:
skillgrade --eval=my-eval --grader=deterministic. - If a grader returns unexpected scores, inspect the script output and adjust scoring logic.
Error Handling
- If a deterministic grader outputs non-JSON, ensure all
echo/console.logstatements except the final JSON result are redirected to stderr. - If an LLM rubric grader returns 0.00 with "No API key," set
GEMINI_API_KEYorANTHROPIC_API_KEYin the environment. - If scores are inconsistent across trials, reduce rubric ambiguity by adding concrete examples of passing and failing behavior.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
superlint
angular-modern-apis
Guidelines for using modern Angular APIs (signals, inject, control flow)
skillgrade-setup
Sets up and runs skillgrade evaluation pipelines for Agent Skills. Use when initializing eval configurations, running trials, reviewing results, or integrating with CI. Don't use for writing grader scripts, general test authoring, or non-agentic documentation.
skill-creator
Authors and structures professional-grade agent skills following the agentskills.io spec. Use when creating new skill directories, drafting procedural instructions, or optimizing metadata for discoverability. Don't use for general documentation, non-agentic library code, or README files.
verl-rl-training
Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.
openrlhf-training
High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.
Didn't find tool you were looking for?