Agent skill

funsloth-check

Validate datasets for Unsloth fine-tuning. Use when the user wants to check a dataset, analyze tokens, calculate Chinchilla optimality, or prepare data for training.

View SKILL.md on GitHub Repository

Stars 5

Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/chrisvoncsefalvay/funsloth/tree/main/skills/funsloth-check

SKILL.md

Dataset Validation for Unsloth Fine-tuning

Validate datasets before fine-tuning with Unsloth.

Quick Start

For automated validation, use the script:

bash

python scripts/validate_dataset.py --dataset "dataset-id" --model llama-3.1-8b --lora-rank 16

Workflow

1. Get Dataset Source

Ask for: HF dataset ID (e.g., mlabonne/FineTome-100k) or local path (e.g., ./data.jsonl)

2. Load and Detect Format

Auto-detect format from structure. See DATA_FORMATS.md for details.

Format	Detection	Key Fields
Raw	`text` only	`text`
Alpaca	`instruction` + `output`	`instruction`, `output`
ShareGPT	`conversations` array	`from`, `value`
ChatML	`messages` array	`role`, `content`

3. Validate Schema

Check required fields exist. Report issues with fix suggestions.

4. Show Samples

Display 2-3 examples for visual verification.

5. Token Analysis

Report statistics: total tokens, min/max/mean/median sequence length.

Flag concerns:

Sequences > 4096 tokens
Sequences < 10 tokens

6. Chinchilla Analysis

Ask for target model and LoRA rank, then calculate:

Chinchilla Fraction	Interpretation
< 0.5x	Dataset may be too small
0.5x - 2.0x	Good range
> 2.0x	Large dataset, may take longer

7. Recommendations

Based on analysis, suggest:

standardize_sharegpt() for ShareGPT data
Sequence length adjustments
Learning rate for small datasets

8. Optional: HF Upload

Offer to upload local datasets to Hub.

9. Handoff

Pass context to funsloth-train:

yaml

dataset_id: "mlabonne/FineTome-100k"
format_type: "sharegpt"
total_tokens: 15000000
target_model: "llama-3.1-8b"
use_lora: true
lora_rank: 16
chinchilla_fraction: 1.2

Bundled Resources

scripts/validate_dataset.py - Automated validation script
DATA_FORMATS.md - Dataset format reference

Maintainer

chrisvoncsefalvay Core maintainer

Source details

Full Name: chrisvoncsefalvay/funsloth
Branch: main
Path in repo: skills/funsloth-check

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

chrisvoncsefalvay/funsloth

funsloth-upload

Generate comprehensive model cards and upload fine-tuned models to Hugging Face Hub with professional documentation

5 0

Explore

chrisvoncsefalvay/funsloth

funsloth-train

Generate Unsloth training notebooks and scripts. Use when the user wants to create a training notebook, configure fine-tuning parameters, or set up SFT/DPO/GRPO training.

5 0

Explore

chrisvoncsefalvay/funsloth

funsloth-hfjobs

Training manager for Hugging Face Jobs - launch fine-tuning on HF cloud GPUs with optional WandB monitoring

5 0

Explore

chrisvoncsefalvay/funsloth

funsloth-local

Training manager for local GPU training - validate CUDA, manage GPU selection, monitor progress, handle checkpoints

5 0

Explore

chrisvoncsefalvay/funsloth

funsloth-runpod

Training manager for RunPod GPU instances - configure pods, launch training, monitor progress, retrieve checkpoints

5 0

Explore

mattpocock/skills

edit-article

Edit and improve articles by restructuring sections, improving clarity, and tightening prose. Use when user wants to edit, revise, or improve an article draft.

111,310 9,758

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Dataset Validation for Unsloth Fine-tuning

Quick Start

Workflow

1. Get Dataset Source

2. Load and Detect Format

3. Validate Schema

4. Show Samples

5. Token Analysis

6. Chinchilla Analysis

7. Recommendations

8. Optional: HF Upload

9. Handoff

Bundled Resources

Recommended Agent Skills

funsloth-upload

funsloth-train

funsloth-hfjobs

funsloth-local

funsloth-runpod

edit-article