Agent skill

funsloth-runpod

Training manager for RunPod GPU instances - configure pods, launch training, monitor progress, retrieve checkpoints

View SKILL.md on GitHub Repository

Stars 5

Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/chrisvoncsefalvay/funsloth/tree/main/skills/funsloth-runpod

SKILL.md

RunPod Training Manager

Run Unsloth training on RunPod GPU instances.

Prerequisites

RunPod API Key: echo $RUNPOD_API_KEY (get at runpod.io/console/user/settings)
RunPod SDK: pip install runpod
Training notebook/script: From funsloth-train

Workflow

1. Select GPU

GPU	VRAM	Cost	Best For
RTX 3090	24GB	~$0.35/hr	Budget 7-14B
RTX 4090	24GB	~$0.55/hr	Fast 7-14B
A100 40GB	40GB	~$1.50/hr	14-34B
A100 80GB	80GB	~$2.00/hr	70B
H100	80GB	~$3.50/hr	Fastest

RunPod typically has better prices than HF Jobs.

2. Choose Deployment

Pod (Recommended): Persistent, SSH access, network storage
Serverless: Pay per second, complex setup (better for inference)

3. Configure Network Volume (Recommended)

python

import runpod
volume = runpod.create_network_volume(name="funsloth-training", size_gb=50, region="US")

Allows: resume training, download checkpoints, share between pods.

4. Launch Pod

Use the official Unsloth Docker image for a pre-configured environment:

python

import runpod

pod = runpod.create_pod(
    name="funsloth-training",
    image_name="unsloth/unsloth",  # Official image, supports all GPUs incl. Blackwell
    gpu_type_id="{gpu_type}",
    volume_in_gb=50,
    network_volume_id="{volume_id}",
    env={
        "HF_TOKEN": "{token}",
        "WANDB_API_KEY": "{key}",
        "JUPYTER_PASSWORD": "unsloth",
    },
    ports="8888/http,22/tcp",
)

The Unsloth image includes Jupyter Lab (port 8888) and example notebooks in /workspace/unsloth-notebooks/.

5. Upload and Run

bash

# SSH into pod
ssh root@{pod_ip}

# Upload script
scp train.py root@{pod_ip}:/workspace/

# Run training (use tmux for persistence)
tmux new -s training
cd /workspace && python train.py
# Ctrl+B, D to detach

6. Monitor

bash

# SSH monitoring
tail -f /workspace/training.log
nvidia-smi -l 1

# Dashboard
https://runpod.io/console/pods/{pod_id}

7. Retrieve Checkpoints

bash

# Save to network volume
cp -r /workspace/outputs /runpod-volume/

# Download via SCP
scp -r root@{pod_ip}:/workspace/outputs ./

# Or push to HF Hub from pod

8. Stop Pod

python

runpod.stop_pod(pod_id)    # Can resume later
runpod.terminate_pod(pod_id)  # Deletes pod, keeps volume

9. Handoff

Offer funsloth-upload for Hub upload with model card.

Best Practices

Always use network volumes - pod storage is ephemeral
Use spot instances for lower costs (risk of preemption)
Set up SSH keys before creating pods
Stop pods when not training - charges per minute
Save checkpoints frequently with save_steps

Error Handling

Error	Resolution
Pod creation failed	Try different GPU type or region
SSH refused	Wait 1-2 min, check IP
Out of disk	Increase volume or clean up
Volume not mounting	Check same region as pod

Bundled Resources

scripts/train_sft.py - Training script template
scripts/estimate_cost.py - Cost estimation
references/PLATFORM_COMPARISON.md - RunPod vs alternatives
references/TROUBLESHOOTING.md - Common issues

Maintainer

chrisvoncsefalvay Core maintainer

Source details

Full Name: chrisvoncsefalvay/funsloth
Branch: main
Path in repo: skills/funsloth-runpod

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

chrisvoncsefalvay/funsloth

funsloth-upload

Generate comprehensive model cards and upload fine-tuned models to Hugging Face Hub with professional documentation

5 0

Explore

chrisvoncsefalvay/funsloth

funsloth-train

Generate Unsloth training notebooks and scripts. Use when the user wants to create a training notebook, configure fine-tuning parameters, or set up SFT/DPO/GRPO training.

5 0

Explore

chrisvoncsefalvay/funsloth

funsloth-hfjobs

Training manager for Hugging Face Jobs - launch fine-tuning on HF cloud GPUs with optional WandB monitoring

5 0

Explore

chrisvoncsefalvay/funsloth

funsloth-local

Training manager for local GPU training - validate CUDA, manage GPU selection, monitor progress, handle checkpoints

5 0

Explore

chrisvoncsefalvay/funsloth

funsloth-check

Validate datasets for Unsloth fine-tuning. Use when the user wants to check a dataset, analyze tokens, calculate Chinchilla optimality, or prepare data for training.

5 0

Explore

mattpocock/skills

edit-article

Edit and improve articles by restructuring sections, improving clarity, and tightening prose. Use when user wants to edit, revise, or improve an article draft.

111,310 9,758

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

RunPod Training Manager

Prerequisites

Workflow

1. Select GPU

2. Choose Deployment

3. Configure Network Volume (Recommended)

4. Launch Pod

5. Upload and Run

6. Monitor

7. Retrieve Checkpoints

8. Stop Pod

9. Handoff

Best Practices

Error Handling

Bundled Resources

Recommended Agent Skills

funsloth-upload

funsloth-train

funsloth-hfjobs

funsloth-local

funsloth-check

edit-article