Agent skills
fine-tuning-expert

Agent skill

fine-tuning-expert

Use when fine-tuning LLMs, training custom models, or adapting foundation models for specific tasks. Invoke for configuring LoRA/QLoRA adapters, preparing JSONL training datasets, setting hyperparameters for fine-tuning runs, adapter training, transfer learning, finetuning with Hugging Face PEFT, OpenAI fine-tuning, instruction tuning, RLHF, DPO, or quantizing and deploying fine-tuned models. Trigger terms include: LoRA, QLoRA, PEFT, finetuning, fine-tuning, adapter tuning, LLM training, model training, custom model.

View SKILL.md on GitHub Repository

Stars 7,481

Forks 528

Install this agent skill to your Project

npx add-skill https://github.com/Jeffallan/claude-skills/tree/main/skills/fine-tuning-expert

Metadata

Additional technical details for this skill

role: expert
scope: implementation
author: https://github.com/Jeffallan
domain: data-ml
version: 1.1.0
triggers: fine-tuning, fine tuning, finetuning, LoRA, QLoRA, PEFT, adapter tuning, transfer learning, model training, custom model, LLM training, instruction tuning, RLHF, model optimization, quantization
output format: code
related skills: devops-engineer

SKILL.md

Fine-Tuning Expert

Senior ML engineer specializing in LLM fine-tuning, parameter-efficient methods, and production model optimization.

Core Workflow

Dataset preparation — Validate and format data; run quality checks before training starts
- Checkpoint: python validate_dataset.py --input data.jsonl — fix all errors before proceeding
Method selection — Choose PEFT technique based on GPU memory and task requirements
- Use LoRA for most tasks; QLoRA (4-bit) when GPU memory is constrained; full fine-tune only for small models
Training — Configure hyperparameters, monitor loss curves, checkpoint regularly
- Checkpoint: validation loss must decrease; plateau or increase signals overfitting
Evaluation — Benchmark against the base model; test on held-out set and edge cases
- Checkpoint: collect perplexity, task-specific metrics (BLEU/ROUGE), and latency numbers
Deployment — Merge adapter weights, quantize, measure inference throughput before serving

Reference Guide

Load detailed guidance based on context:

Topic	Reference	Load When
LoRA/PEFT	`references/lora-peft.md`	Parameter-efficient fine-tuning, adapters
Dataset Prep	`references/dataset-preparation.md`	Training data formatting, quality checks
Hyperparameters	`references/hyperparameter-tuning.md`	Learning rates, batch sizes, schedulers
Evaluation	`references/evaluation-metrics.md`	Benchmarking, metrics, model comparison
Deployment	`references/deployment-optimization.md`	Model merging, quantization, serving

Minimal Working Example — LoRA Fine-Tuning with Hugging Face PEFT

python

from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer
import torch

# 1. Load base model and tokenizer
model_id = "meta-llama/Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# 2. Configure LoRA adapter
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,               # rank — increase for more capacity, decrease to save memory
    lora_alpha=32,      # scaling factor; typically 2× rank
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()  # verify: should be ~0.1–1% of total params

# 3. Load and format dataset (Alpaca-style JSONL)
dataset = load_dataset("json", data_files={"train": "train.jsonl", "test": "test.jsonl"})

def format_prompt(example):
    return {"text": f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['output']}"}

dataset = dataset.map(format_prompt)

# 4. Training arguments
training_args = TrainingArguments(
    output_dir="./checkpoints",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,     # effective batch size = 16
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    warmup_ratio=0.03,                 # always use warmup
    fp16=False,
    bf16=True,
    logging_steps=10,
    eval_strategy="steps",
    eval_steps=100,
    save_steps=200,
    load_best_model_at_end=True,
)

# 5. Train
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    dataset_text_field="text",
    max_seq_length=2048,
)
trainer.train()

# 6. Save adapter weights only
model.save_pretrained("./lora-adapter")
tokenizer.save_pretrained("./lora-adapter")

QLoRA variant — add these lines before loading the model to enable 4-bit quantization:

python

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")

Merge adapter into base model for deployment:

python

from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
merged = PeftModel.from_pretrained(base, "./lora-adapter").merge_and_unload()
merged.save_pretrained("./merged-model")

Constraints

MUST DO

Validate dataset quality before training
Use parameter-efficient methods for large models (>7B)
Monitor training/validation loss curves
Document hyperparameters and training config
Version datasets and model checkpoints
Always include a learning rate warmup

MUST NOT DO

Skip data quality validation
Overfit on small datasets — use regularisation (dropout, weight decay) and early stopping
Merge incompatible adapters (mismatched rank, base model, or target modules)
Deploy without evaluation against a held-out set and latency benchmark

Output Templates

When implementing fine-tuning, always provide:

Dataset preparation script with validation logic (schema checks, token-length histogram, deduplication)
Training configuration (full TrainingArguments + LoraConfig block, commented)
Evaluation script reporting perplexity, task-specific metrics, and latency
Brief design rationale — why this PEFT method, rank, and learning rate were chosen for this task

Maintainer

Jeffallan Core maintainer

Source details

Full Name: Jeffallan/claude-skills
Branch: main
Path in repo: skills/fine-tuning-expert
License: MIT License
Topics: claude-code claude ai-agents claude-skills claude-marketplace

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

Jeffallan/claude-skills

graphql-architect

Use when designing GraphQL schemas, implementing Apollo Federation, or building real-time subscriptions. Invoke for schema design, resolvers with DataLoader, query optimization, federation directives.

7,481 528

Explore

Jeffallan/claude-skills

dotnet-core-expert

Use when building .NET 8 applications with minimal APIs, clean architecture, or cloud-native microservices. Invoke for Entity Framework Core, CQRS with MediatR, JWT authentication, AOT compilation.

7,481 528

Explore

Jeffallan/claude-skills

kubernetes-specialist

Use when deploying or managing Kubernetes workloads. Invoke to create deployment manifests, configure pod security policies, set up service accounts, define network isolation rules, debug pod crashes, analyze resource limits, inspect container logs, or right-size workloads. Use for Helm charts, RBAC policies, NetworkPolicies, storage configuration, performance optimization, GitOps pipelines, and multi-cluster management.

7,481 528

Explore

Jeffallan/claude-skills

the-fool

Use when challenging ideas, plans, decisions, or proposals using structured critical reasoning. Invoke to play devil's advocate, run a pre-mortem, red team, or audit evidence and assumptions.

7,481 528

Explore

Jeffallan/claude-skills

spec-miner

Reverse-engineering specialist that extracts specifications from existing codebases. Use when working with legacy or undocumented systems, inherited projects, or old codebases with no documentation. Invoke to map code dependencies, generate API documentation from source, identify undocumented business logic, figure out what code does, or create architecture documentation from implementation. Trigger phrases: reverse engineer, old codebase, no docs, no documentation, figure out how this works, inherited project, legacy analysis, code archaeology, undocumented features.

7,481 528

Explore

Jeffallan/claude-skills

secure-code-guardian

Use when implementing authentication/authorization, securing user input, or preventing OWASP Top 10 vulnerabilities — including custom security implementations such as hashing passwords with bcrypt/argon2, sanitizing SQL queries with parameterized statements, configuring CORS/CSP headers, validating input with Zod, and setting up JWT tokens. Invoke for authentication, authorization, input validation, encryption, OWASP Top 10 prevention, secure session management, and security hardening. For pre-built OAuth/SSO integrations or standalone security audits, consider a more specialized skill.

7,481 528

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

Fine-Tuning Expert

Core Workflow

Reference Guide

Minimal Working Example — LoRA Fine-Tuning with Hugging Face PEFT

Constraints

MUST DO

MUST NOT DO

Output Templates

Recommended Agent Skills

graphql-architect

dotnet-core-expert

kubernetes-specialist

the-fool

spec-miner

secure-code-guardian