Agent skill
ff-ml-modeling
Expert guidance on machine learning and feature engineering for fantasy football player projection models. Use this skill when building predictive models, engineering features from player statistics, selecting appropriate ML algorithms, or addressing sports-specific ML challenges. Covers feature engineering patterns, model selection frameworks, validation strategies, and interpretability techniques for fantasy football analytics.
Install this agent skill to your Project
npx add-skill https://github.com/zazu-22/ff_data_analytics/tree/main/.claude/skills/ff-analytics-team/ff-ml-modeling
SKILL.md
Machine Learning & Feature Engineering for Fantasy Football
Overview
Provide expert guidance on building ML-based player projection models using research-backed feature engineering patterns, appropriate model selection, and sports-specific validation strategies. Apply domain expertise to help design features, choose models, avoid common pitfalls, and create interpretable predictions.
When to Use This Skill
Trigger this skill for queries involving:
- Feature engineering: "What features should I include?" "How do I create age curve features?" "What are good opportunity metrics?"
- Model selection: "Which ML model should I use?" "Random Forest or XGBoost?" "When to use regularized regression?"
- Validation strategies: "How do I validate sports models?" "What's wrong with standard cross-validation?" "How to avoid data leakage?"
- Sports-specific challenges: "How to handle small sample sizes?" "How to model position differences?" "Handling regime changes?"
- Feature selection: "How to reduce 109 stats to key features?" "Lasso vs Ridge?" "How to handle multicollinearity?"
- Model interpretability: "How to explain predictions?" "What features matter most?" "SHAP values for fantasy?"
Note: For dynasty strategy questions (player valuation, trade analysis, roster construction), use ff-dynasty-strategy. For statistical methods (regression types, simulations, GAMs), use ff-statistical-methods.
Core Capabilities
1. Feature Engineering
Core Principle: Feature engineering is more important than model selection for sports predictions.
Key Feature Categories:
Age Curves
- Marcel system: 3-year weighted average + age adjustment + regression to mean
- Position-specific peaks: RB 23-26, WR 26-28, QB 28-33, TE 26-29
- Implementation:
age_factor = 1 - (age - peak_age) * 0.003for decline phase
Opportunity Metrics
- Target share, snap share, weighted opportunities (carries + targets×1.5)
- Points per opportunity (efficiency measure)
- Volume is king: opportunity metrics predict better than TDs
Efficiency Statistics
- Yards per route run (YPRR), yards per carry (YPC)
- Yards after contact (YAC), catch rate
- Warning: Noisy with small samples, use rolling averages
Interaction Terms
- QB quality × target share (receiver production context)
- Opponent strength adjustments
- Game script (leading = rushing, trailing = passing)
- ~40% of team performance from synergy effects
Rolling Averages
- Last 3 games, last 5 games, season-long
- Trend features: recent form vs established baseline
- Lag features: last game, same opponent last season
Reference: references/feature_engineering.md for formulas, implementation patterns, and common mistakes.
2. Model Selection
Decision Framework:
Primary Goal?
├─ Interpretability → Linear/Ridge/Lasso Regression
└─ Performance
├─ Small (<1000) → Ridge/Lasso/Elastic Net
├─ Medium (1K-10K) → Random Forest or XGBoost
└─ Large (>10K) → XGBoost/LightGBM or Ensemble
Model Types:
Linear Regression: Baseline, interpretability, small samples
Regularized Regression: High-dimensional data, multicollinearity, automatic feature selection (Lasso)
Random Forest: Medium data, robustness, feature importance
XGBoost/LightGBM: Best single-model performance, handles missing values
Ensemble: Combine Ridge + RF + XGBoost (weighted 1:2:2), often 2-5% improvement
Position-Specific Modeling: Train separate models per position (RB features ≠ WR features)
Reference: references/model_selection.md for detailed comparisons, hyperparameters, and implementation.
3. Validation Strategies
Critical Rule: NEVER use standard cross-validation with shuffle=True
❌ Wrong: KFold(n_splits=5, shuffle=True) → Data leakage!
✅ Correct: TimeSeriesSplit(n_splits=5) → Train on past, test on future
Time-Series Split: Always predict future from past data
Appropriate Metrics:
- MAE (Mean Absolute Error): Most interpretable
- RMSE: Penalizes large errors
- R²: Proportion of variance explained
Nested Cross-Validation: Outer loop for evaluation, inner loop for hyperparameter tuning
Reference: references/validation_strategies.md for detailed workflows and common mistakes.
4. Sports-Specific Challenges
Small Sample Sizes: NFL = 17 games/season → Use regularization (Ridge/Lasso)
Position-Specific Modeling: Separate models per position with different feature sets
Regime Changes: Weight recent seasons heavier, use sliding window validation
Data Leakage Prevention: Only use data available at prediction time, time-series validation
Reference: references/model_selection.md sections on sports-specific considerations.
Workflow: Building a Player Projection Model
Step 1: Feature Engineering
- Start with raw stats (yards, TDs, targets, snaps)
- Create opportunity metrics (target share, snap %)
- Add efficiency features (YPRR, YPC)
- Generate rolling averages (3-game, 5-game)
- Include age curves and interaction terms
- Use
assets/player_projection_model_template.pyas starting point
Step 2: Feature Selection
- Check correlation (remove highly correlated features)
- Use Lasso for automatic selection
- SHAP values for importance
- Domain knowledge: prioritize opportunity > efficiency > TDs
Step 3: Model Selection
- Establish baseline (linear regression or Marcel)
- Try regularized model (Elastic Net)
- Test tree-based (Random Forest, then XGBoost)
- Position-specific models
- Ensemble top 2-3 models
Step 4: Validation
- Hold out most recent season as final test
- TimeSeriesSplit on training data
- Nested CV for hyperparameter tuning
- Evaluate MAE by position
Step 5: Interpretability
- SHAP values for feature importance
- Partial dependence plots for age curves
- Validate on new season
Identifying Data Requirements
For Player Projection Models:
- Historical performance (3+ years for aging curves)
- Opportunity metrics (targets, snaps, routes run, carries)
- Efficiency stats (YPRR, YPC, catch rate)
- Contextual data (opponent strength, QB quality, game script)
- Position and age
For Feature Engineering:
- Player-level: Stats, age, position, career year
- Team-level: Total targets, snaps, carries (for share calculations)
- Game-level: Score differential, home/away, opponent defense rank
- Season-level: Rule changes, schedule strength
Integrating with Other Skills
Complement with ff-dynasty-strategy when:
- Need domain knowledge for feature selection (aging curves, TD regression)
- Interpreting model outputs (sell-high candidates)
- Understanding position-specific patterns
Complement with ff-statistical-methods when:
- Choosing regression type (OLS vs Lasso vs GAMs)
- Running Monte Carlo simulations using predictions
- Performing variance analysis
Best Practices
Feature Engineering Over Model Complexity - Well-engineered features make simple models outperform complex ones
Always Use Time-Series Validation - Standard CV inflates performance 15-20%
Position-Specific Models - RB features ≠ WR features ≠ QB features
Regularization for Small Samples - NFL has limited games (17/season)
Prioritize Interpretability - SHAP values for explainability, start simple
References
references/feature_engineering.md- Age curves, opportunity metrics, efficiency stats, interaction termsreferences/model_selection.md- Decision framework, model types, hyperparametersreferences/validation_strategies.md- Time-series splits, nested CV, metrics
Assets
assets/player_projection_model_template.py- Python template for building player projection models
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
notebook-creator
Create Jupyter notebooks for FF Analytics following project conventions. Use this skill when the user requests analysis notebooks for player evaluation, roster health, trade scenarios, market trends, or projection quality analysis. Guides through notebook structure, DuckDB connections, mart queries, visualization standards, and freshness banner patterns.
adr-creator
Create Architecture Decision Records (ADRs) documenting significant technical decisions for the FF Analytics platform. This skill should be used when making architectural choices, evaluating alternatives for data models or infrastructure, documenting trade-offs, or when the user asks "should we use X or Y approach?" Guides through the ADR creation workflow from context gathering to documentation.
sprint-planner
Design and structure development sprints with atomic, LLM-ready task files. This skill should be used when the user wants to plan a new development sprint, break down complex projects into executable tasks, or create a sprint execution framework. Produces sprint plans, atomic task specifications, and corresponding executor skills for LLM coding agents.
dbt-model-builder
Create dbt models following FF Analytics Kimball patterns and 2×2 stat model. This skill should be used when creating staging models, core facts/dimensions, or analytical marts. Guides through model creation with proper grain, tests, External Parquet configuration, and per-model YAML documentation using dbt 1.10+ syntax.
data-quality-test-generator
Generate comprehensive dbt test suites following FF Analytics data quality standards and dbt 1.10+ syntax. This skill should be used when creating tests for new dbt models, adding tests to existing models, standardizing test coverage, or implementing data quality gates. Covers grain uniqueness, FK relationships, enum validation, and freshness tests.
strategic-planner
Design comprehensive technical specifications and strategic plans for data architecture and analytics projects. This skill should be used when planning major features, creating SPEC documents, assessing product requirements, breaking down complex projects into phases, or documenting architectural strategies like SPEC-1. Guides through requirements gathering, MoSCoW prioritization, phase planning, and open items tracking.
Didn't find tool you were looking for?