Agent skill
ml-pipeline
Designs and implements production-grade ML pipeline infrastructure: configures experiment tracking with MLflow or Weights & Biases, creates Kubeflow or Airflow DAGs for training orchestration, builds feature store schemas with Feast, deploys model registries, and automates retraining and validation workflows. Use when building ML pipelines, orchestrating training workflows, automating model lifecycle, implementing feature stores, managing experiment tracking systems, setting up DVC for data versioning, tuning hyperparameters, or configuring MLOps tooling like Kubeflow, Airflow, MLflow, or Prefect.
Install this agent skill to your Project
npx add-skill https://github.com/Jeffallan/claude-skills/tree/main/skills/ml-pipeline
Metadata
Additional technical details for this skill
- role
- expert
- scope
- implementation
- author
- https://github.com/Jeffallan
- domain
- data-ml
- version
- 1.1.0
- triggers
- ML pipeline, MLflow, Kubeflow, feature engineering, model training, experiment tracking, feature store, hyperparameter tuning, pipeline orchestration, model registry, training workflow, MLOps, model deployment, data pipeline, model versioning
- output format
- code
- related skills
- devops-engineer, kubernetes-specialist, cloud-architect, python-pro
SKILL.md
ML Pipeline Expert
Senior ML pipeline engineer specializing in production-grade machine learning infrastructure, orchestration systems, and automated training workflows.
Core Workflow
- Design pipeline architecture — Map data flow, identify stages, define interfaces between components
- Validate data schema — Run schema checks and distribution validation before any training begins; halt and report on failures
- Implement feature engineering — Build transformation pipelines, feature stores, and validation checks
- Orchestrate training — Configure distributed training, hyperparameter tuning, and resource allocation
- Track experiments — Log metrics, parameters, and artifacts; enable comparison and reproducibility
- Validate and deploy — Run model evaluation gates; implement A/B testing or shadow deployment before promotion
Reference Guide
Load detailed guidance based on context:
| Topic | Reference | Load When |
|---|---|---|
| Feature Engineering | references/feature-engineering.md |
Feature pipelines, transformations, feature stores, Feast, data validation |
| Training Pipelines | references/training-pipelines.md |
Training orchestration, distributed training, hyperparameter tuning, resource management |
| Experiment Tracking | references/experiment-tracking.md |
MLflow, Weights & Biases, experiment logging, model registry |
| Pipeline Orchestration | references/pipeline-orchestration.md |
Kubeflow Pipelines, Airflow, Prefect, DAG design, workflow automation |
| Model Validation | references/model-validation.md |
Evaluation strategies, validation workflows, A/B testing, shadow deployment |
Code Templates
MLflow Experiment Logging (minimal reproducible example)
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
import numpy as np
# Pin random state for reproducibility
SEED = 42
np.random.seed(SEED)
mlflow.set_experiment("my-classifier-experiment")
with mlflow.start_run():
# Log all hyperparameters — never hardcode silently
params = {"n_estimators": 100, "max_depth": 5, "random_state": SEED}
mlflow.log_params(params)
model = RandomForestClassifier(**params)
model.fit(X_train, y_train)
preds = model.predict(X_test)
# Log metrics
mlflow.log_metric("accuracy", accuracy_score(y_test, preds))
mlflow.log_metric("f1", f1_score(y_test, preds, average="weighted"))
# Log and register the model artifact
mlflow.sklearn.log_model(model, artifact_path="model",
registered_model_name="my-classifier")
Kubeflow Pipeline Component (single-step template)
from kfp.v2 import dsl
from kfp.v2.dsl import component, Input, Output, Dataset, Model, Metrics
@component(base_image="python:3.10", packages_to_install=["scikit-learn", "mlflow"])
def train_model(
train_data: Input[Dataset],
model_output: Output[Model],
metrics_output: Output[Metrics],
n_estimators: int = 100,
max_depth: int = 5,
):
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import pickle, json
df = pd.read_csv(train_data.path)
X, y = df.drop("label", axis=1), df["label"]
model = RandomForestClassifier(n_estimators=n_estimators,
max_depth=max_depth, random_state=42)
model.fit(X, y)
with open(model_output.path, "wb") as f:
pickle.dump(model, f)
metrics_output.log_metric("train_samples", len(df))
@dsl.pipeline(name="training-pipeline")
def training_pipeline(data_path: str, n_estimators: int = 100):
train_step = train_model(n_estimators=n_estimators)
# Chain additional steps (validate, register, deploy) here
Data Validation Checkpoint (Great Expectations style)
import great_expectations as ge
def validate_training_data(df):
"""Run schema and distribution checks. Raise on failure — never skip."""
gdf = ge.from_pandas(df)
results = gdf.expect_column_values_to_not_be_null("label")
results &= gdf.expect_column_values_to_be_between("feature_1", 0, 1)
if not results["success"]:
raise ValueError(f"Data validation failed: {results['result']}")
return df # safe to proceed to training
Constraints
Always:
- Version all data, code, and models explicitly (DVC, Git tags, model registry)
- Pin dependencies and random seeds for reproducible training environments
- Log all hyperparameters, metrics, and artifacts to experiment tracking
- Validate data schema and distribution before training begins
- Use containerized environments; store credentials in secrets managers, never in code
- Implement error handling, retry logic, and pipeline alerting
- Separate training and inference code clearly
Never:
- Run training without experiment tracking or without logging hyperparameters
- Deploy a model without recorded validation metrics
- Use non-reproducible random states or skip data validation
- Ignore pipeline failures silently or mix credentials into pipeline code
Output Format
When implementing a pipeline, provide:
- Complete pipeline definition (Kubeflow DAG, Airflow DAG, or equivalent) — use the templates above as starting structure
- Feature engineering code with inline data validation calls
- Training script with MLflow (or equivalent) experiment logging
- Model evaluation code with explicit pass/fail thresholds
- Deployment configuration and rollback strategy
- Brief explanation of architecture decisions and reproducibility measures
Knowledge Reference
MLflow, Kubeflow Pipelines, Apache Airflow, Prefect, Feast, Weights & Biases, Neptune, DVC, Great Expectations, Ray, Horovod, Kubernetes, Docker, S3/GCS/Azure Blob, model registry patterns, feature store architecture, distributed training, hyperparameter optimization
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
graphql-architect
Use when designing GraphQL schemas, implementing Apollo Federation, or building real-time subscriptions. Invoke for schema design, resolvers with DataLoader, query optimization, federation directives.
dotnet-core-expert
Use when building .NET 8 applications with minimal APIs, clean architecture, or cloud-native microservices. Invoke for Entity Framework Core, CQRS with MediatR, JWT authentication, AOT compilation.
kubernetes-specialist
Use when deploying or managing Kubernetes workloads. Invoke to create deployment manifests, configure pod security policies, set up service accounts, define network isolation rules, debug pod crashes, analyze resource limits, inspect container logs, or right-size workloads. Use for Helm charts, RBAC policies, NetworkPolicies, storage configuration, performance optimization, GitOps pipelines, and multi-cluster management.
the-fool
Use when challenging ideas, plans, decisions, or proposals using structured critical reasoning. Invoke to play devil's advocate, run a pre-mortem, red team, or audit evidence and assumptions.
spec-miner
Reverse-engineering specialist that extracts specifications from existing codebases. Use when working with legacy or undocumented systems, inherited projects, or old codebases with no documentation. Invoke to map code dependencies, generate API documentation from source, identify undocumented business logic, figure out what code does, or create architecture documentation from implementation. Trigger phrases: reverse engineer, old codebase, no docs, no documentation, figure out how this works, inherited project, legacy analysis, code archaeology, undocumented features.
secure-code-guardian
Use when implementing authentication/authorization, securing user input, or preventing OWASP Top 10 vulnerabilities — including custom security implementations such as hashing passwords with bcrypt/argon2, sanitizing SQL queries with parameterized statements, configuring CORS/CSP headers, validating input with Zod, and setting up JWT tokens. Invoke for authentication, authorization, input validation, encryption, OWASP Top 10 prevention, secure session management, and security hardening. For pre-built OAuth/SSO integrations or standalone security audits, consider a more specialized skill.
Didn't find tool you were looking for?