Agent skill
funsloth-local
Training manager for local GPU training - validate CUDA, manage GPU selection, monitor progress, handle checkpoints
Install this agent skill to your Project
npx add-skill https://github.com/chrisvoncsefalvay/funsloth/tree/main/skills/funsloth-local
SKILL.md
Local GPU Training Manager
Run Unsloth training on your local GPU.
Prerequisites Check
1. Verify CUDA
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
If CUDA not available:
- Check NVIDIA drivers:
nvidia-smi - Check CUDA:
nvcc --version - Reinstall PyTorch:
pip install torch --index-url https://download.pytorch.org/whl/cu121
2. Check VRAM
See references/HARDWARE_GUIDE.md for requirements:
| VRAM | Recommended Setup |
|---|---|
| 8GB | 7B, 4-bit, batch=1, LoRA r=8 |
| 12GB | 7B, 4-bit, batch=2, LoRA r=16 |
| 16GB | 7-13B, 4-bit, batch=2, LoRA r=16-32 |
| 24GB | 7-14B, 4-bit, batch=4, LoRA r=32 |
3. Check Dependencies
pip install unsloth torch transformers trl peft datasets accelerate bitsandbytes
Docker Option
Use the official Unsloth Docker image for a pre-configured environment (supports all GPUs including Blackwell/50-series):
docker run -d \
-e JUPYTER_PASSWORD="unsloth" \
-p 8888:8888 \
-v $(pwd)/work:/workspace/work \
--gpus all \
unsloth/unsloth
Access Jupyter at http://localhost:8888. Example notebooks are in /workspace/unsloth-notebooks/.
Environment variables:
JUPYTER_PASSWORD- Jupyter auth (default:unsloth)JUPYTER_PORT- Port (default:8888)USER_PASSWORD- User/sudo password (default:unsloth)
Run Training
Option 1: Notebook
jupyter notebook notebooks/sft_template.ipynb
Option 2: Script
# Edit configuration in script, then run
python scripts/train_sft.py
GPU Selection (Multi-GPU)
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Use first GPU
Monitor Training
Terminal
# Watch GPU usage
watch -n 1 nvidia-smi
# Or use nvitop (more detailed)
pip install nvitop && nvitop
WandB (Optional)
export WANDB_API_KEY="your-key"
# Add report_to="wandb" in TrainingArguments
Troubleshooting
OOM Error
Try in order:
- Reduce batch_size (to 1)
- Increase gradient_accumulation
- Reduce max_seq_length
- Reduce LoRA rank
torch.cuda.empty_cache()
Loss Not Decreasing
- Check learning rate (try higher or lower)
- Verify chat template matches model
- Inspect data format
Training Too Slow
- Enable bf16 if supported
- Use
packing=Truefor short sequences - Reduce logging_steps
See references/TROUBLESHOOTING.md for more solutions.
Resume from Checkpoint
TrainingArguments(
resume_from_checkpoint=True, # Auto-find latest
# Or: resume_from_checkpoint="outputs/checkpoint-500"
)
Save Model
Training script automatically saves:
outputs/lora_adapter/- LoRA weightsoutputs/merged_16bit/- Merged model (optional)
Test Inference
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained("outputs/lora_adapter")
FastLanguageModel.for_inference(model)
messages = [{"role": "user", "content": "Hello!"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Handoff
Offer funsloth-upload for Hub upload with model card.
Tips
- Close other GPU apps before training
- Monitor temps - keep under 85C
- Use UPS for long runs
- Save frequently with
save_steps
Bundled Resources
- notebooks/sft_template.ipynb - Notebook template
- scripts/train_sft.py - Script template
- references/HARDWARE_GUIDE.md - VRAM requirements
- references/TROUBLESHOOTING.md - Common issues
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
funsloth-upload
Generate comprehensive model cards and upload fine-tuned models to Hugging Face Hub with professional documentation
funsloth-train
Generate Unsloth training notebooks and scripts. Use when the user wants to create a training notebook, configure fine-tuning parameters, or set up SFT/DPO/GRPO training.
funsloth-hfjobs
Training manager for Hugging Face Jobs - launch fine-tuning on HF cloud GPUs with optional WandB monitoring
funsloth-check
Validate datasets for Unsloth fine-tuning. Use when the user wants to check a dataset, analyze tokens, calculate Chinchilla optimality, or prepare data for training.
funsloth-runpod
Training manager for RunPod GPU instances - configure pods, launch training, monitor progress, retrieve checkpoints
edit-article
Edit and improve articles by restructuring sections, improving clarity, and tightening prose. Use when user wants to edit, revise, or improve an article draft.
Didn't find tool you were looking for?