Agent skill
policyengine-simulation-mechanics
ALWAYS LOAD THIS SKILL before writing any policyengine.py microsimulation code. Contains correct import paths, environment setup, dataset loading, and analysis patterns. Triggers: "write a script", "policyengine.py", "microsimulation script", "run a simulation", "load the dataset", "FRS", "EFRS", "enhanced FRS", "CPS", "enhanced CPS", "by income decile", "by tenure", "by region", "energy spending", "domestic energy", "household net income", "output_dataset", "ensure_datasets", "uk_datasets", "us_datasets", "import datasets", "from policyengine", "Simulation(dataset=", "uk_latest", "us_latest", "plotly", "analysis script", "decile breakdown", "percentile", "groupby", "weighted", "mean", "median", "p25", "p75", "tenure type", "income band", "policy reform script".
Install this agent skill to your Project
npx add-skill https://github.com/PolicyEngine/policyengine-claude/tree/main/skills/tools-and-apis/policyengine-simulation-mechanics-skill
SKILL.md
PolicyEngine Simulation Mechanics
This skill covers advanced patterns for working with policyengine.py simulations, including caching, result access, and entity mapping.
CRITICAL: Environment Setup
Before writing any code, check the environment. The policyengine.py package must be installed in the project's .venv.
# Always run from the policyengine.py repo root:
cd /path/to/policyengine.py
uv run python script.py
# Or activate first:
source .venv/bin/activate
python script.py
# NEVER use bare `pip install` — always:
uv pip install -e ".[uk]" # for UK work
uv pip install -e ".[us]" # for US work
If from policyengine.core import Simulation fails:
cd /path/to/policyengine.py
uv pip install -e ".[uk]"
# Then re-run with: uv run python script.py
CRITICAL: Correct Import Paths
Only these imports exist — do not guess others:
# Core simulation
from policyengine.core import Simulation
# UK model
from policyengine.tax_benefit_models.uk import (
uk_latest, # The model version (pass as tax_benefit_model_version=)
uk_model, # The model itself
PolicyEngineUKDataset,
UKYearData,
create_datasets, # Create & cache datasets from HF source
load_datasets, # Load cached datasets from disk
ensure_datasets, # Create if missing, load if present (recommended)
)
# US model
from policyengine.tax_benefit_models.us import (
us_latest,
PolicyEngineUSDataset,
ensure_datasets,
)
# Outputs
from policyengine.outputs.aggregate import Aggregate, AggregateType
from policyengine.outputs.change_aggregate import ChangeAggregate, ChangeAggregateType
# Plotting
from policyengine.utils.plotting import COLORS, format_fig
There is NO:
policyengine.core.dataset_registrypolicyengine.datasetspolicyengine.core.dataset_version.DatasetVersion.list()
UK Datasets
Loading UK datasets
Use ensure_datasets() — it returns a dict[str, PolicyEngineUKDataset], building files in ./data/ on first run and loading from disk on subsequent runs.
WARNING: from policyengine.tax_benefit_models.uk import datasets gives you the Python submodule, not a dict. Never index it like a dict.
from policyengine.tax_benefit_models.uk import ensure_datasets
uk = ensure_datasets(
datasets=[
"hf://policyengine/policyengine-uk-data/frs_2023_24.h5",
"hf://policyengine/policyengine-uk-data/enhanced_frs_2023_24.h5",
],
years=[2026],
data_folder="./data",
)
efrs = uk["enhanced_frs_2023_24_2026"]
frs = uk["frs_2023_24_2026"]
Dict key format: "{stem}_{year}" e.g. "enhanced_frs_2023_24_2026"
To force regeneration: delete ./data/ and call ensure_datasets() again.
Loading US datasets
from policyengine.tax_benefit_models.us import ensure_datasets
us = ensure_datasets(
datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"],
years=[2026],
data_folder="./data",
)
ecps = us["enhanced_cps_2024_2026"]
Default US dataset: enhanced_cps_2024.h5 (Enhanced CPS), years 2024–2028.
Inspecting available variables
Always inspect the dataset to find available variable names — never guess:
from policyengine.tax_benefit_models.uk import ensure_datasets
uk = ensure_datasets(years=[2026], data_folder="./data")
d = uk["enhanced_frs_2023_24_2026"]
# Input variables (present in raw data)
print("household:", list(d.data.household.columns))
print("person: ", list(d.data.person.columns))
print("benunit: ", list(d.data.benunit.columns))
Input variables are what's in the raw survey data — demographics, reported incomes, consumption, wealth, flags.
Computed variables (household_net_income, income_tax, universal_credit, etc.) are not in the raw dataset — they are calculated by the simulation. To see what's available after running:
from policyengine.core import Simulation
from policyengine.tax_benefit_models.uk import uk_latest
sim = Simulation(dataset=d, tax_benefit_model_version=uk_latest)
sim.run()
print("household (post-sim):", list(sim.output_dataset.data.household.columns))
print("person (post-sim): ", list(sim.output_dataset.data.person.columns))
The computed variables available are defined by uk_latest.entity_variables — inspect this to see the full list without running a simulation:
from policyengine.tax_benefit_models.uk import uk_latest
print(uk_latest.entity_variables) # dict: entity → [variable names]
For Analysts: Core Concepts
When running simulations with policyengine.py (the microsimulation package, not the API client), you work with three key components:
Simulation.ensure()- Smart caching to avoid redundant computationsimulation.output_dataset.data- Accessing calculated resultsmap_to_entity()- Converting data between entity levels (person ↔ household)
Note: This is for microsimulation with policyengine.py, not the policyengine Python API client (which uses Simulation(situation=...)).
Simulation Lifecycle
The Four Methods
from policyengine.core import Simulation
from policyengine.tax_benefit_models.uk import uk_latest
simulation = Simulation(
dataset=dataset,
tax_benefit_model_version=uk_latest,
)
# Method 1: Always run (no caching)
simulation.run()
# Method 2: Run only if needed (recommended)
simulation.ensure()
# Method 3: Save results to disk
simulation.save()
# Method 4: Load results from disk
simulation.load()
When to Use Each
run(): Use when you need fresh results or parameters changedensure(): Use for iterative development (checks cache → disk → run)save(): Use to persist large simulation resultsload(): Use to resume from previous session
How ensure() Works
def ensure(self):
# 1. Check in-memory LRU cache (100 simulations)
cached = _cache.get(self.id)
if cached:
self.output_dataset = cached.output_dataset
return
# 2. Try loading from disk
try:
self.tax_benefit_model_version.load(self)
except Exception:
# 3. Only run if both cache and disk fail
self.run()
self.save()
# 4. Add to cache for next ensure() call
_cache.add(self.id, self)
Performance impact:
- First call: Full simulation runtime (seconds to minutes)
- Same session: Instant (in-memory cache)
- New session: Fast (disk load, no recomputation)
Example: Reusing Baseline Across Reforms
# Run baseline once
baseline = Simulation(dataset=dataset, tax_benefit_model_version=uk_latest)
baseline.ensure() # First call: runs simulation
baseline.save() # Persist to disk
# Test multiple reforms
for reform in [reform1, reform2, reform3]:
baseline.ensure() # Instant from cache!
reform_sim = Simulation(
dataset=dataset,
tax_benefit_model_version=uk_latest,
policy=reform
)
reform_sim.run() # Only reform needs to run
# Compare results...
Accessing Results: output_dataset.data
After running a simulation, all calculated variables are in simulation.output_dataset.data.
Structure (UK Example)
simulation.run()
# Access output container
output = simulation.output_dataset.data
# Entity-level MicroDataFrames
output.person # Person-level results
output.benunit # Benefit unit results
output.household # Household-level results
US Entity Structure
# US has more entities
output.person
output.tax_unit # Federal tax filing unit
output.spm_unit # Supplemental Poverty Measure unit
output.family # Census family definition
output.marital_unit # Married couple or single
output.household
Available Variables
Each dataframe contains input variables + calculated variables:
# Person-level (UK)
print(output.person.columns)
# ['person_id', 'person_household_id', 'age', 'employment_income',
# 'income_tax', 'national_insurance', 'net_income', ...]
# Household-level (UK)
print(output.household.columns)
# ['household_id', 'region', 'rent', 'household_net_income',
# 'household_benefits', 'household_tax', ...]
# Benunit-level (UK)
print(output.benunit.columns)
# ['benunit_id', 'universal_credit', 'child_benefit',
# 'working_tax_credit', 'child_tax_credit', ...]
Direct Data Access
# Get specific columns
incomes = output.household[["household_id", "household_net_income"]]
# Filter data
high_earners = output.person[output.person["employment_income"] > 100000]
# Calculate statistics (automatically weighted!)
mean_income = output.household["household_net_income"].mean()
total_tax = output.household["household_tax"].sum()
# Access individual values
first_hh_income = output.household["household_net_income"].iloc[0]
MicroDataFrame Automatic Weighting
All operations respect survey weights automatically:
# These are all weighted calculations
total_population = output.person["person_weight"].sum()
mean_income = output.household["household_net_income"].mean()
poverty_rate = output.household["in_absolute_poverty_bhc"].mean()
# Groupby operations are weighted
by_region = output.household.groupby("region")["household_net_income"].mean()
Entity Mapping with map_to_entity()
Convert data between entity levels (e.g., sum person income to household, or broadcast household rent to persons).
Method Signature
output.map_to_entity(
source_entity: str, # Entity to map from
target_entity: str, # Entity to map to
columns: list[str] = None, # Columns to map (None = all)
values: np.ndarray = None, # Custom values instead
how: str = "sum" # Aggregation method
)
Aggregation Methods
Person → Group (aggregation):
how="sum"(default): Sum values within each grouphow="first": Take first value in each grouphow="mean": Average valueshow="max": Maximum valuehow="min": Minimum value
Group → Person (expansion):
how="project"(default): Broadcast group value to all membershow="divide": Split group value equally among members
Example 1: Sum Person Income to Household
# Sum employment income across all people in each household
household_employment = output.map_to_entity(
source_entity="person",
target_entity="household",
columns=["employment_income"],
how="sum"
)
# Result is MicroDataFrame at household level
print(household_employment.columns)
# ['household_id', 'employment_income'] # Now household total
Example 2: Broadcast Household Rent to Persons
# Give each person their household's rent value
person_rent = output.map_to_entity(
source_entity="household",
target_entity="person",
columns=["rent"],
how="project"
)
# Each person now has their household's rent
print(person_rent.columns)
# ['person_id', 'rent']
Example 3: Divide Household Value Per Person
# Split household savings equally among members
person_savings_share = output.map_to_entity(
source_entity="household",
target_entity="person",
columns=["total_savings"],
how="divide"
)
# If household has £12,000 savings and 3 people, each gets £4,000
Example 4: Map Custom Values
import numpy as np
# Calculate custom person-level values
custom_tax = np.where(
output.person["employment_income"] > 50000,
output.person["income_tax"] * 1.1, # 10% increase for high earners
output.person["income_tax"]
)
# Aggregate to household level
household_custom_tax = output.map_to_entity(
source_entity="person",
target_entity="household",
values=custom_tax,
how="sum"
)
Example 5: Multi-Column Mapping
# Map multiple income sources to household level
household_incomes = output.map_to_entity(
source_entity="person",
target_entity="household",
columns=[
"employment_income",
"self_employment_income",
"pension_income",
"savings_interest_income"
],
how="sum"
)
# Result has all columns at household level
Example 6: Cross-Entity Mapping (Group to Group)
# UK: Map benunit benefits to household level
# (Multiple benunits can exist in one household)
household_uc = output.map_to_entity(
source_entity="benunit",
target_entity="household",
columns=["universal_credit", "child_benefit"],
how="sum"
)
Automatic Mapping in Aggregate Classes
The Aggregate and ChangeAggregate classes automatically handle entity mapping when the variable and target entity don't match:
from policyengine.outputs.aggregate import Aggregate, AggregateType
# income_tax is person-level, but we want household-level sum
total_tax = Aggregate(
simulation=simulation,
variable="income_tax", # Person-level
entity="household", # Household-level aggregation
aggregate_type=AggregateType.SUM,
)
total_tax.run()
# Automatically maps income_tax from person to household using sum()
Common Patterns
Pattern 1: Compare Baseline vs Reform
# Run both simulations
baseline = Simulation(dataset=dataset, tax_benefit_model_version=uk_latest)
baseline.ensure()
reform = Simulation(
dataset=dataset,
tax_benefit_model_version=uk_latest,
policy=reform_policy
)
reform.ensure()
# Get outputs
baseline_out = baseline.output_dataset.data
reform_out = reform.output_dataset.data
# Calculate differences
baseline_income = baseline_out.household["household_net_income"]
reform_income = reform_out.household["household_net_income"]
difference = reform_income - baseline_income
# Count winners/losers (weighted)
winners = (difference > 0).sum()
losers = (difference < 0).sum()
unchanged = (difference == 0).sum()
Pattern 2: Calculate Custom Derived Variable
# Calculate marginal tax rate at person level
person_data = output.person.copy()
person_data["mtr"] = (
(person_data["income_tax"] + person_data["national_insurance"])
/ person_data["employment_income"].clip(lower=1)
) * 100
# Map to household level (max MTR in household)
household_mtr = output.map_to_entity(
source_entity="person",
target_entity="household",
values=person_data["mtr"].values,
how="max"
)
Pattern 3: Extract Subset for Analysis
# Get London households with children
london_hh = output.household[output.household["region"] == "LONDON"]
households_with_children = output.person.groupby("person_household_id")["age"].apply(
lambda ages: (ages < 18).any()
)
# Combine filters
london_ids = set(london_hh["household_id"])
hh_with_kids_ids = set(households_with_children[households_with_children].index)
target_ids = london_ids & hh_with_kids_ids
# Extract subset
subset_hh = output.household[output.household["household_id"].isin(target_ids)]
subset_persons = output.person[output.person["person_household_id"].isin(target_ids)]
Pattern 4: Reuse Baseline Across Multiple Reforms
# Run baseline once
baseline = Simulation(dataset=dataset, tax_benefit_model_version=uk_latest)
baseline.ensure()
baseline.save()
# Test multiple reforms efficiently
reforms = [reform1, reform2, reform3]
results = {}
for reform in reforms:
baseline.ensure() # Instant from cache
reform_sim = Simulation(
dataset=dataset,
tax_benefit_model_version=uk_latest,
policy=reform
)
reform_sim.run()
# Calculate impact
from policyengine.outputs.change_aggregate import ChangeAggregate, ChangeAggregateType
revenue = ChangeAggregate(
baseline_simulation=baseline,
reform_simulation=reform_sim,
variable="household_tax",
aggregate_type=ChangeAggregateType.SUM,
)
revenue.run()
results[reform.name] = revenue.result
Direct Data Analysis (without Aggregate)
For custom analyses (decile breakdowns, percentiles, groupby), work directly with output_dataset.data after running the simulation. This is often simpler than using Aggregate.
Full working example: energy spending by income decile and tenure type
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from policyengine.core import Simulation
from policyengine.tax_benefit_models.uk import PolicyEngineUKDataset, uk_latest
# Load dataset
dataset = PolicyEngineUKDataset(
name="Enhanced FRS 2026",
description="EFRS 2026",
filepath="./data/enhanced_frs_2023_24_year_2026.h5",
year=2026,
)
dataset.load()
# Run simulation to compute income variables
simulation = Simulation(dataset=dataset, tax_benefit_model_version=uk_latest)
simulation.ensure() # caches to disk after first run
# Access results as DataFrames
hh = simulation.output_dataset.data.household
# Assign income decile (weighted quantile)
hh["income_decile"] = pd.qcut(
hh["household_net_income"],
q=10,
labels=[f"D{i}" for i in range(1, 11)],
)
# Group and calculate stats
stats = (
hh.groupby(["income_decile", "tenure_type"])["domestic_energy_consumption"]
.agg(
mean="mean",
p25=lambda x: np.percentile(x, 25),
p75=lambda x: np.percentile(x, 75),
)
.reset_index()
)
Key points:
simulation.output_dataset.data.householdis aMicroDataFramewith weightsdomestic_energy_consumptionis household-level (annual £)tenure_typevalues:OWNED_OUTRIGHT,OWNED_WITH_MORTGAGE,RENT_FROM_COUNCIL,RENT_PRIVATELY,RENT_FROM_HA- Income deciles must be computed from simulation output (not raw data)
Performance Tips
- Use
ensure()for iterative work: Can save minutes when re-running analyses - Filter before mapping: Reduces computation on large datasets
- Use
Aggregateclasses: Optimised implementations for common operations - Batch similar calculations: Run multiple aggregates in sequence
- Cache intermediate results: Store derived calculations
# Good: Filter then map
high_earners = output.person[output.person["employment_income"] > 100000]
high_earner_hh_income = output.map_to_entity(
source_entity="person",
target_entity="household",
values=high_earners["employment_income"].values,
how="sum"
)
# Less efficient: Map then filter
all_hh_income = output.map_to_entity(
source_entity="person",
target_entity="household",
columns=["employment_income"],
how="sum"
)
high_earner_hh = all_hh_income[all_hh_income["employment_income"] > 100000]
For Contributors: Implementation
Current implementation:
# Simulation lifecycle
cat policyengine.py/src/policyengine/core/simulation.py
# Entity mapping logic
cat policyengine.py/src/policyengine/core/dataset.py
# Cache implementation
cat policyengine.py/src/policyengine/core/cache.py
Key patterns:
- Simulation caching: LRU cache with max 100 entries, keyed by UUID
- Entity mapping: Automatic detection of mapping direction (person→group or group→person)
- MicroDataFrame: All entity data uses weighted DataFrames from microdf package
Related skills:
policyengine-core-skill- Understanding simulation engine architecturemicrodf-skill- Working with weighted DataFramespolicyengine-python-client-skill- Basic simulation usage
Debugging Tips
Verify Simulation Ran
assert simulation.output_dataset is not None, "Simulation hasn't run"
# Check for expected variables
expected = ["household_net_income", "household_tax"]
actual = simulation.output_dataset.data.household.columns
assert all(v in actual for v in expected), "Missing variables"
Check Entity Linkages
# Verify person-household mapping is valid
person_hh_ids = set(output.person["person_household_id"])
household_ids = set(output.household["household_id"])
assert person_hh_ids.issubset(household_ids), "Invalid linkage"
Verify Weights
# Check weights sum correctly
total_persons = output.person["person_weight"].sum()
print(f"Weighted population: {total_persons:,.0f}")
# Check for missing weights
assert not output.person["person_weight"].isna().any(), "Missing weights"
Related Documentation
In policyengine.py repo:
.claude/policyengine-guide.md- High-level patterns.claude/quick-reference.md- Syntax cheat sheet.claude/working-with-simulations.md- Detailed simulation guideexamples/- Full working examplesdocs/core-concepts.md- Architecture documentation
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
policyengine-healthcare
Healthcare program modeling in PolicyEngine-US — Medicaid, ACA marketplace, CHIP, and Medicare. Covers encoding rules, running analyses, and navigating the unique complexity of US healthcare programs. Triggers: "healthcare", "health insurance", "Medicaid", "ACA", "CHIP", "Medicare", "marketplace", "premium tax credit", "APTC", "PTC", "SLCSP", "benchmark plan", "rating area", "age curve", "family tier", "coverage gap", "Medicaid expansion", "MAGI", "medicaid_magi", "aca_magi", "medicaid_income_level", "medicaid_category", "enrollment", "takeup", "take-up", "per capita", "CSR", "cost sharing", "insurance premium", "second lowest silver", "required contribution percentage", "42 CFR", "IRC 36B", "categorical eligibility", "expansion adult", "healthcare reform", "healthcare analysis", "health policy".
policyengine-us
ALWAYS LOAD THIS SKILL FIRST before writing any PolicyEngine-US code. Contains the correct API patterns for household calculations and population simulations using the new policyengine package. Covers US federal and state taxes/benefits. Triggers: "what would", "how much would a", "benefit be", "eligible for", "qualify for", "single parent", "married couple", "family of", "household of", "if they earn", "earning $", "making $", "calculate benefits", "calculate taxes", "benefit for a", "what would I get", "what is the maximum", "what is the rate", "poverty line", "income limit", "benefit amount", "maximum benefit", "compare states", "TANF", "SNAP", "EITC", "CTC", "SSI", "WIC", "Section 8", "Medicaid", "ACA", "child tax credit", "earned income", "supplemental security", "housing voucher", "microsimulation", "population", "reform", "policy impact", "budgetary", "decile".
policyengine-uk
ALWAYS LOAD THIS SKILL FIRST before writing any PolicyEngine-UK code. Contains the correct API patterns for household calculations and population simulations using the new policyengine package (not policyengine_uk directly). Triggers: "what would", "how much would a", "benefit be", "eligible for", "qualify for", "single parent", "married couple", "family of", "household of", "if they earn", "with income of", "earning £", "making £", "calculate benefits", "calculate taxes", "benefit for a", "tax for a", "what would I get", "what would they get", "what is the rate", "what is the threshold", "personal allowance", "maximum benefit", "income limit", "benefit amount", "how much is", "Universal Credit", "child benefit", "pension credit", "housing benefit", "council tax", "income tax", "national insurance", "JSA", "ESA", "PIP", "disability living allowance", "working tax credit", "child tax credit", "Scotland", "Wales", "UK", "microsimulation", "population", "reform", "policy impact", "budgetary", "decile".
policyengine-canada
ALWAYS LOAD THIS SKILL FIRST before writing any PolicyEngine-Canada code. Contains Canadian federal and provincial tax/benefit rules for household calculations. IMPORTANT: PolicyEngine-Canada does NOT have representative population microdata. Do NOT attempt microsimulation or population-level estimates for Canada. Only provide household-level analysis (single-family impacts, eligibility, benefit amounts). Triggers: "what would", "how much would a", "benefit be", "eligible for", "qualify for", "single parent", "married couple", "family of", "household of", "if they earn", "earning $", "making $", "calculate benefits", "calculate taxes", "benefit for a", "what would I get", "what is the maximum", "what is the rate", "income limit", "benefit amount", "maximum benefit", "compare provinces", "CCB", "Canada Child Benefit", "GST credit", "HST credit", "GST/HST", "OAS", "Old Age Security", "GIS", "Guaranteed Income Supplement", "CWB", "Canada Workers Benefit", "EI", "Employment Insurance", "CPP", "Canada Pension Plan", "RRSP", "TFSA", "Ontario Child Benefit", "OCB", "Ontario Trillium Benefit", "OTB", "BC Climate Action", "Alberta Child Benefit", "Quebec", "CRA", "Canada Revenue Agency", "Canadian", "Canada", "Ontario", "British Columbia", "Alberta", "Saskatchewan", "Manitoba", "Nova Scotia", "New Brunswick", "PEI", "Newfoundland", "Yukon", "NWT", "Nunavut", "provincial tax", "federal tax Canada".
policyengine-ui-kit-consumer
This skill should be used when setting up a new project that uses @policyengine/ui-kit, debugging CSS or styling issues in a consumer app, or when Tailwind utility classes are not being generated. Also use when creating globals.css, configuring PostCSS, or troubleshooting "no styles", "no spacing", or "no layout" problems. Triggers: "ui-kit import", "globals.css setup", "Tailwind not working", "styles not applying", "utility classes missing", "setup ui-kit", "PostCSS config", "no styling", "CSS broken", "import ui-kit", "theme.css", "no layout", "no spacing", "@tailwindcss/postcss"
policyengine-tailwind-shadcn
Tailwind CSS v4 + shadcn/ui integration patterns for PolicyEngine frontend projects. Covers @theme namespaces, CSS variable conventions, SVG var() usage, and common mistakes. Triggers: "Tailwind v4", "@theme", "shadcn", "CSS variables", "design tokens CSS", "theme.css", "@theme inline"
Didn't find tool you were looking for?