Agent skill
funsloth-runpod
Training manager for RunPod GPU instances - configure pods, launch training, monitor progress, retrieve checkpoints
Install this agent skill to your Project
npx add-skill https://github.com/chrisvoncsefalvay/funsloth/tree/main/skills/funsloth-runpod
SKILL.md
RunPod Training Manager
Run Unsloth training on RunPod GPU instances.
Prerequisites
- RunPod API Key:
echo $RUNPOD_API_KEY(get at runpod.io/console/user/settings) - RunPod SDK:
pip install runpod - Training notebook/script: From
funsloth-train
Workflow
1. Select GPU
| GPU | VRAM | Cost | Best For |
|---|---|---|---|
| RTX 3090 | 24GB | ~$0.35/hr | Budget 7-14B |
| RTX 4090 | 24GB | ~$0.55/hr | Fast 7-14B |
| A100 40GB | 40GB | ~$1.50/hr | 14-34B |
| A100 80GB | 80GB | ~$2.00/hr | 70B |
| H100 | 80GB | ~$3.50/hr | Fastest |
RunPod typically has better prices than HF Jobs.
2. Choose Deployment
- Pod (Recommended): Persistent, SSH access, network storage
- Serverless: Pay per second, complex setup (better for inference)
3. Configure Network Volume (Recommended)
import runpod
volume = runpod.create_network_volume(name="funsloth-training", size_gb=50, region="US")
Allows: resume training, download checkpoints, share between pods.
4. Launch Pod
Use the official Unsloth Docker image for a pre-configured environment:
import runpod
pod = runpod.create_pod(
name="funsloth-training",
image_name="unsloth/unsloth", # Official image, supports all GPUs incl. Blackwell
gpu_type_id="{gpu_type}",
volume_in_gb=50,
network_volume_id="{volume_id}",
env={
"HF_TOKEN": "{token}",
"WANDB_API_KEY": "{key}",
"JUPYTER_PASSWORD": "unsloth",
},
ports="8888/http,22/tcp",
)
The Unsloth image includes Jupyter Lab (port 8888) and example notebooks in /workspace/unsloth-notebooks/.
5. Upload and Run
# SSH into pod
ssh root@{pod_ip}
# Upload script
scp train.py root@{pod_ip}:/workspace/
# Run training (use tmux for persistence)
tmux new -s training
cd /workspace && python train.py
# Ctrl+B, D to detach
6. Monitor
# SSH monitoring
tail -f /workspace/training.log
nvidia-smi -l 1
# Dashboard
https://runpod.io/console/pods/{pod_id}
7. Retrieve Checkpoints
# Save to network volume
cp -r /workspace/outputs /runpod-volume/
# Download via SCP
scp -r root@{pod_ip}:/workspace/outputs ./
# Or push to HF Hub from pod
8. Stop Pod
runpod.stop_pod(pod_id) # Can resume later
runpod.terminate_pod(pod_id) # Deletes pod, keeps volume
9. Handoff
Offer funsloth-upload for Hub upload with model card.
Best Practices
- Always use network volumes - pod storage is ephemeral
- Use spot instances for lower costs (risk of preemption)
- Set up SSH keys before creating pods
- Stop pods when not training - charges per minute
- Save checkpoints frequently with
save_steps
Error Handling
| Error | Resolution |
|---|---|
| Pod creation failed | Try different GPU type or region |
| SSH refused | Wait 1-2 min, check IP |
| Out of disk | Increase volume or clean up |
| Volume not mounting | Check same region as pod |
Bundled Resources
- scripts/train_sft.py - Training script template
- scripts/estimate_cost.py - Cost estimation
- references/PLATFORM_COMPARISON.md - RunPod vs alternatives
- references/TROUBLESHOOTING.md - Common issues
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
funsloth-upload
Generate comprehensive model cards and upload fine-tuned models to Hugging Face Hub with professional documentation
funsloth-train
Generate Unsloth training notebooks and scripts. Use when the user wants to create a training notebook, configure fine-tuning parameters, or set up SFT/DPO/GRPO training.
funsloth-hfjobs
Training manager for Hugging Face Jobs - launch fine-tuning on HF cloud GPUs with optional WandB monitoring
funsloth-local
Training manager for local GPU training - validate CUDA, manage GPU selection, monitor progress, handle checkpoints
funsloth-check
Validate datasets for Unsloth fine-tuning. Use when the user wants to check a dataset, analyze tokens, calculate Chinchilla optimality, or prepare data for training.
edit-article
Edit and improve articles by restructuring sections, improving clarity, and tightening prose. Use when user wants to edit, revise, or improve an article draft.
Didn't find tool you were looking for?