Agent skill
healthsim-trialsim
Generate realistic clinical trial synthetic data including study definitions, sites, subjects, visits, adverse events, efficacy assessments, and disposition. Use when user requests: clinical trial data, CDISC/SDTM/ADaM datasets, trial cohorts (Phase I/II/III/IV), FDA submission test data, or specific therapeutic areas like oncology or biologics/CGT.
Install this agent skill to your Project
npx add-skill https://github.com/mark64oswald/healthsim-workspace/tree/main/skills/trialsim
SKILL.md
TrialSim
Status: Active Development
TrialSim generates realistic synthetic clinical trial data for testing, training, and development purposes.
For Claude
Use this skill when the user requests clinical trial data, CDISC-compliant datasets, or regulatory submission test data. This is the primary skill for generating realistic synthetic clinical trial data.
When to apply this skill:
- User mentions clinical trials, studies, or protocols
- User requests CDISC, SDTM, or ADaM datasets
- User specifies trial phases (Phase I, II, III, IV)
- User mentions FDA/EMA submission data or regulatory requirements
- User asks for adverse events, safety data, or efficacy endpoints
- User mentions specific therapeutic areas (oncology, cardiovascular, CNS)
- User requests SDTM domains (DM, AE, VS, LB, CM, EX, DS, MH)
Key capabilities:
- Generate complete study definitions with protocol parameters
- Create multi-site, multi-country trial configurations
- Produce subject-level longitudinal data with realistic patterns
- Generate safety data (adverse events, labs, vitals) with MedDRA/LOINC coding
- Create efficacy endpoints for various therapeutic areas
- Output CDISC-compliant formats (SDTM, ADaM)
For specific trial phases, therapeutic areas, or SDTM domains, load the appropriate skill from the tables below.
Overview
TrialSim provides:
- Complete study lifecycle data (protocol to closeout)
- Multi-site, multi-country trial configurations
- Subject-level longitudinal data with realistic patterns
- Safety data (adverse events, labs, vitals)
- Efficacy endpoints (primary, secondary, exploratory)
- CDISC-compliant output (SDTM, ADaM)
Trigger Phrases
Activate TrialSim when user mentions:
- "clinical trial" or "clinical study"
- "Phase I/II/III/IV" or "pivotal trial"
- "CDISC", "SDTM", "ADaM"
- "FDA submission data" or "regulatory data"
- "adverse events" or "safety data"
- "efficacy endpoints"
- Trial therapeutic areas (oncology, cardiology, etc.)
- SDTM domains (DM, AE, VS, LB, CM, EX, DS)
Quick Links
Core Skills
| Topic | Skill | Description |
|---|---|---|
| Domain Knowledge | clinical-trials-domain.md | Core trial concepts, phases, regulatory |
| Recruitment | recruitment-enrollment.md | Screening funnel, enrollment patterns |
Trial Phase Skills
| Phase | Skill | Description |
|---|---|---|
| Phase 1 | phase1-dose-escalation.md | FIH, dose escalation, MTD (3+3, BOIN, CRM) |
| Phase 2 | phase2-proof-of-concept.md | POC, dose-ranging, futility (Simon's, MCP-Mod) |
| Phase 3 | phase3-pivotal.md | Pivotal registration trials, NDA/BLA |
SDTM Domain Skills
| Domain | Skill | Description |
|---|---|---|
| DM | domains/demographics-dm.md | Subject demographics, treatment arms |
| AE | domains/adverse-events-ae.md | Adverse events with MedDRA coding |
| VS | domains/vital-signs-vs.md | Vital sign measurements |
| LB | domains/laboratory-lb.md | Laboratory results with LOINC |
| CM | domains/concomitant-meds-cm.md | Concomitant medications with ATC |
| EX | domains/exposure-ex.md | Study drug exposure, dose modifications |
| DS | domains/disposition-ds.md | Subject disposition, discontinuation |
| MH | domains/medical-history-mh.md | Medical history, comorbidities |
| Domain Index | domains/README.md | All SDTM domains overview |
Therapeutic Areas
| Area | Skill | Key Endpoints |
|---|---|---|
| Oncology | therapeutic-areas/oncology.md | RECIST, ORR, PFS, OS |
| Cardiovascular | therapeutic-areas/cardiovascular.md | MACE, CV outcomes |
| CNS | therapeutic-areas/cns.md | Cognitive scales, imaging |
| CGT | therapeutic-areas/cgt.md | CAR-T, gene therapy |
Real World Evidence
| Topic | Skill | Description |
|---|---|---|
| RWE Overview | rwe/overview.md | RWE concepts, data sources |
| Synthetic Controls | rwe/synthetic-control.md | External control arm generation |
Output Formats
| Format | Skill | Use Case |
|---|---|---|
| SDTM | ../../formats/cdisc-sdtm.md | Regulatory submission |
| ADaM | ../../formats/cdisc-adam.md | Statistical analysis |
| Dimensional | ../../formats/dimensional-analytics.md | BI dashboards, analytics |
| JSON | Default | API integration |
| CSV | ../../formats/csv.md | Spreadsheet analysis |
Data Models & References
| Resource | Location | Description |
|---|---|---|
| Canonical Models | ../../references/data-models.md#trialsim-models | 15 entity schemas (Subject, Study, Site, AE, etc.) |
| Dimensional Schema | ../../formats/dimensional-analytics.md#trialsim-clinical-trial-analytics | Star schema for BI (7 dims, 6 facts) |
| Code Systems | ../../references/code-systems.md | MedDRA, LOINC, ATC |
Core Entities
TrialSim uses 15 canonical entity schemas. See Data Models Reference for complete JSON schemas.
Entity Overview
| Entity | SDTM Domain | Description |
|---|---|---|
| Subject | DM | Trial participant (extends Person) |
| Study | TS | Protocol definition |
| Site | - | Investigational site |
| TreatmentArm | TA | Study arm definition |
| VisitSchedule | TV | Protocol visits |
| ActualVisit | SV | Subject visit occurrence |
| Randomization | DM/SE | Subject randomization |
| AdverseEvent | AE | Safety events with MedDRA |
| Exposure | EX | Study drug dosing |
| ConcomitantMed | CM | Prior/concomitant meds with ATC |
| TrialLab | LB | Lab results with LOINC |
| EfficacyAssessment | RS/TR | Response assessments |
| MedicalHistory | MH | Pre-existing conditions |
| DispositionEvent | DS | Subject disposition |
| ProtocolDeviation | DV | Protocol deviations |
Key Entity Examples
Study:
{
"study_id": "ABC-123-001",
"protocol_title": "A Phase 3, Randomized, Double-Blind Study...",
"phase": "Phase 3",
"therapeutic_area": "Oncology",
"indication": "Non-Small Cell Lung Cancer",
"sponsor": "Example Pharma Inc.",
"status": "Ongoing"
}
Subject (with cross-product linking):
{
"subject_id": "0001",
"usubjid": "ABC-123-001-001-0001",
"site_id": "001",
"patient_ref": "MRN-12345",
"screening_date": "2024-01-15",
"randomization_date": "2024-01-22",
"treatment_arm": "TRT",
"status": "Active"
}
Integration with Other Products
TrialSim integrates with other HealthSim products for complete clinical trial data:
| From | To | Integration Pattern |
|---|---|---|
| PatientSim | TrialSim | Patient → Subject (add consent, randomization, protocol visits) |
| NetworkSim | TrialSim | Provider → Investigator (add credentials, training, delegation log) |
| PopulationSim | TrialSim | Demographics → Recruitment pool (geographic, demographic eligibility) |
Cross-Product: PatientSim
Trial subjects are patients with additional trial-specific data:
- ../patientsim/oncology/ - Oncology trial subjects
- ../patientsim/heart-failure.md - CV outcomes trial subjects
- ../patientsim/behavioral-health.md - CNS trial subjects
- ../patientsim/diabetes-management.md - Metabolic trial subjects
Integration Pattern: Use PatientSim for baseline clinical characteristics. TrialSim adds protocol-specific assessments (RECIST, NYHA class changes), randomization, and SDTM-formatted data.
Cross-Product: PopulationSim Integration
PopulationSim v2.0 provides embedded real-world data for evidence-based trial planning, site selection, and diversity compliance. When geographies are specified, TrialSim uses actual CDC PLACES, SVI, and ADI data to ground feasibility estimates and enrollment projections.
Data-Driven Trial Planning Pattern
Step 1: Look up real population data for potential sites
# For site feasibility in Houston metro (Harris County, FIPS: 48201)
Read from: skills/populationsim/data/county/places_county_2024.csv
→ DIABETES_CrudePrev: 12.1% (for diabetes trial)
→ CHD_CrudePrev: 6.4% (for CV outcomes trial)
→ CANCER_CrudePrev: 6.2% (for oncology trial)
→ TotalPopulation: 4,731,145
Read from: skills/populationsim/data/county/svi_county_2022.csv
→ RPL_THEMES: 0.68 (moderate-high vulnerability)
→ EP_MINRTY: 72.1% (supports diversity requirements)
Step 2: Apply to site feasibility estimation
{
"site_feasibility": {
"county_fips": "48201",
"county_name": "Harris County, TX",
"indication": "Type 2 Diabetes",
"eligible_population": {
"total_population": 4731145,
"disease_prevalence": 0.121,
"prevalent_patients": 572467,
"age_eligible_18_75": 458974,
"funnel_to_screenable": 0.05,
"annual_screenable": 22949
},
"diversity_metrics": {
"minority_percentage": 0.721,
"meets_fda_diversity_guidance": true
},
"data_provenance": {
"source": "CDC_PLACES_2024",
"data_year": 2022
}
}
}
Step 3: Generate realistic enrollment projections
- Site catchment based on real prevalence (not national averages)
- Diversity enrollment reflecting actual demographics
- Screening-to-randomization rates adjusted for SVI (access barriers)
Embedded Data Sources for Trial Planning
| Source | File | Use in TrialSim |
|---|---|---|
| CDC PLACES County | populationsim/data/county/places_county_2024.csv |
Disease prevalence for feasibility |
| CDC PLACES Tract | populationsim/data/tract/places_tract_2024.csv |
Catchment area analysis |
| SVI County | populationsim/data/county/svi_county_2022.csv |
Diversity planning, access barriers |
| SVI Tract | populationsim/data/tract/svi_tract_2022.csv |
Site-level vulnerability context |
| Geography Crosswalk | populationsim/data/crosswalks/cbsa_definitions.csv |
Metro area site clustering |
Trial-Specific Applications
| Application | Data Used | TrialSim Integration |
|---|---|---|
| Site Feasibility | PLACES disease prevalence + population | Eligible patient pool sizing |
| Diversity Planning | SVI EP_MINRTY, demographics | FDA diversity guidance compliance |
| Enrollment Projection | PLACES + SVI access indicators | Screening/randomization rates |
| Site Selection | Multi-county PLACES comparison | Optimal site network design |
| Catchment Analysis | Tract-level PLACES | Drive-time eligible population |
Example: Data-Grounded Phase III Site Selection
Request: "Identify top 5 US counties for a Phase III NASH trial based on patient availability"
Data Lookup Process:
Query places_county_2024.csv for:
- High OBESITY_CrudePrev (NASH proxy)
- High DIABETES_CrudePrev (comorbidity)
- Large TotalPopulation (volume)
Query svi_county_2022.csv for:
- EP_MINRTY (diversity potential)
- EP_UNINSUR (access consideration)
Output with Provenance:
{
"recommended_sites": [
{
"rank": 1,
"county_fips": "48201",
"name": "Harris County, TX",
"obesity_prevalence": 0.328,
"diabetes_prevalence": 0.121,
"population": 4731145,
"minority_pct": 0.721,
"estimated_eligible": 45000,
"diversity_score": "excellent"
}
],
"data_provenance": {
"sources": ["CDC_PLACES_2024", "CDC_SVI_2022"],
"methodology": "prevalence_weighted_ranking"
}
}
Integration with Trial-Support Skills
| PopulationSim Skill | TrialSim Application | Data Source |
|---|---|---|
| data-lookup.md | Exact prevalence for feasibility | CDC PLACES 2024 |
| county-profile.md | Site catchment demographics | PLACES + SVI |
| svi-analysis.md | Diversity and access analysis | CDC SVI 2022 |
| feasibility-estimation.md | Protocol feasibility funnel | All sources |
| diversity-planning.md | FDA diversity compliance | SVI demographics |
Key Principle: When planning trials, always ground feasibility and diversity estimates in real PopulationSim data. This enables evidence-based site selection and realistic enrollment projections.
Development Status
| Component | Status |
|---|---|
| SKILL.md (this file) | ✅ Complete |
| clinical-trials-domain.md | ✅ Complete |
| recruitment-enrollment.md | ✅ Complete |
| phase3-pivotal.md | ✅ Complete |
| domains/ (DM, AE, VS, LB, CM, EX, DS, MH) | ✅ Complete |
| therapeutic-areas/ | ✅ Complete |
| rwe/ | ✅ Complete |
| phase1-dose-escalation.md | ✅ Complete |
| phase2-proof-of-concept.md | ✅ Complete |
Related Skills
- PatientSim - Clinical patient data
- MemberSim - Claims integration
- Code Systems - Standard terminologies
Output Formats
TrialSim supports multiple output formats:
| Format | Use Case | Skill Reference |
|---|---|---|
| Canonical JSON | Internal processing, API integration | data-models.md |
| CDISC SDTM | Regulatory submission, FDA/EMA | cdisc-sdtm.md |
| CDISC ADaM | Analysis datasets, statistical programming | cdisc-adam.md |
| Dimensional (Star Schema) | Analytics, BI dashboards, DuckDB/Databricks | dimensional-analytics.md |
Dimensional Analytics
For trial operations analytics and BI dashboards, request dimensional output:
Generate Phase III trial with 100 subjects as star schema for DuckDB
This produces:
- Dimensions: dim_study, dim_site, dim_subject, dim_treatment_arm, dim_visit_schedule, dim_meddra, dim_lab_test
- Facts: fact_enrollment, fact_visit, fact_adverse_event, fact_exposure, fact_efficacy, fact_lab_result
See dimensional-analytics.md for full DDL and example queries.
Usage Examples
Example 1: Basic Phase 3 Trial
Prompt: "Generate a Phase 3 oncology trial with 200 subjects"
Output:
{
"study": {
"study_id": "ONCO-2025-001",
"protocol_title": "A Phase 3, Randomized, Double-Blind, Placebo-Controlled Study of ABC-123 in Advanced NSCLC",
"phase": "Phase 3",
"therapeutic_area": "Oncology",
"indication": "Non-Small Cell Lung Cancer",
"design": "Randomized, Double-Blind, Placebo-Controlled",
"primary_endpoint": "Overall Survival",
"target_enrollment": 200,
"arms": [
{ "arm_id": "A", "name": "ABC-123 + SOC", "allocation_ratio": 1 },
{ "arm_id": "B", "name": "Placebo + SOC", "allocation_ratio": 1 }
]
},
"sites": [
{ "site_id": "001", "name": "Memorial Cancer Center", "country": "USA", "enrolled": 25 },
{ "site_id": "002", "name": "University Oncology", "country": "USA", "enrolled": 22 }
],
"subjects": [
{
"subject_id": "001-0001",
"site_id": "001",
"arm": "A",
"status": "Active",
"randomization_date": "2025-01-15"
}
]
}
Example 2: Adverse Events with MedDRA
Prompt: "Generate adverse events for a 50-subject immunotherapy trial"
Output:
{
"domain": "AE",
"adverse_events": [
{
"USUBJID": "IO-001-0023",
"AESEQ": 1,
"AETERM": "Fatigue",
"AEDECOD": "Fatigue",
"AEBODSYS": "General disorders and administration site conditions",
"AESEV": "MILD",
"AESER": "N",
"AEREL": "POSSIBLY RELATED",
"AESTDTC": "2025-02-10",
"AEENDTC": "2025-02-18",
"AEOUT": "RECOVERED/RESOLVED"
},
{
"USUBJID": "IO-001-0007",
"AESEQ": 1,
"AETERM": "Immune-mediated colitis",
"AEDECOD": "Colitis",
"AEBODSYS": "Gastrointestinal disorders",
"AESEV": "SEVERE",
"AESER": "Y",
"AESHOSP": "Y",
"AEREL": "RELATED",
"AEACN": "DRUG INTERRUPTED",
"AESTDTC": "2025-03-05",
"AEOUT": "NOT RECOVERED/NOT RESOLVED"
}
]
}
Example 3: Screening and Enrollment
Prompt: "Generate screening data with 30% screen failure rate for 100 subjects"
Output:
{
"screening_summary": {
"total_screened": 143,
"screen_passed": 100,
"screen_failed": 43,
"screen_failure_rate": 0.30
},
"screen_failures": [
{
"screening_id": "SCR-0015",
"failure_reason": "Inclusion criteria not met",
"failure_detail": "EGFR mutation negative",
"screening_date": "2025-01-20"
},
{
"screening_id": "SCR-0028",
"failure_reason": "Exclusion criteria met",
"failure_detail": "Prior immunotherapy within 6 months",
"screening_date": "2025-01-25"
}
]
}
Example 4: DM Domain SDTM Output
Prompt: "Generate DM domain for 10 subjects as SDTM"
Output:
{
"domain": "DM",
"records": [
{
"STUDYID": "ONCO-2025-001",
"DOMAIN": "DM",
"USUBJID": "ONCO-2025-001-001-0001",
"SUBJID": "0001",
"SITEID": "001",
"RFSTDTC": "2025-01-22",
"RFENDTC": null,
"BRTHDTC": "1958-05-15",
"AGE": 66,
"AGEU": "YEARS",
"SEX": "M",
"RACE": "WHITE",
"ETHNIC": "NOT HISPANIC OR LATINO",
"ARMCD": "TRT",
"ARM": "ABC-123 + SOC",
"COUNTRY": "USA"
}
]
}
Example 5: Laboratory Results with LOINC
Prompt: "Generate LB domain with liver function tests for safety monitoring"
Output:
{
"domain": "LB",
"records": [
{
"STUDYID": "SAFE-001",
"DOMAIN": "LB",
"USUBJID": "SAFE-001-001-0042",
"LBSEQ": 1,
"LBTESTCD": "ALT",
"LBTEST": "Alanine Aminotransferase",
"LBCAT": "CHEMISTRY",
"LBORRES": "32",
"LBORRESU": "U/L",
"LBSTRESN": 32,
"LBSTRESU": "U/L",
"LBSTNRLO": 7,
"LBSTNRHI": 56,
"LBNRIND": "NORMAL",
"LBLOINC": "1742-6",
"LBBLFL": "Y",
"VISITNUM": 2,
"VISIT": "BASELINE"
}
]
}
Generative Framework Integration
TrialSim integrates with the Generative Framework for specification-driven generation at scale.
Profile-Driven Generation
Use profile specifications to generate trial subject populations:
"Generate 150 subjects for an oncology Phase 3 trial"
The Profile Executor will:
- Sample demographics meeting I/E criteria
- Generate baseline disease characteristics
- Apply randomization to treatment arms
- Create screening and baseline assessments
Journey-Driven Generation
Attach protocol journey specifications to create visit sequences:
"Add a 6-cycle treatment protocol journey"
The Journey Executor will:
- Generate protocol visits at specified windows
- Create assessments per visit schedule
- Apply visit variance within windows
- Handle protocol deviations and early termination
Cross-Domain Sync
When generating across products, TrialSim entities are automatically linked:
| TrialSim Entity | Links To |
|---|---|
| Subject | PatientSim Patient (via SSN) |
| Site | NetworkSim Facility |
| Investigator | NetworkSim Provider |
| Conmed | RxMemberSim Fill (if applicable) |
See: ../generation/executors/cross-domain-sync.md
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
healthsim-rxmembersim
RxMemberSim generates realistic synthetic pharmacy data for testing PBM systems, claims adjudication, and drug utilization review. Use when user requests: (1) pharmacy claims or prescription data, (2) DUR alerts or drug interactions, (3) formulary or tier cohorts, (4) pharmacy prior authorization, (5) NCPDP formatted output.
healthsim-patientsim
Generate realistic clinical patient data including demographics, encounters, diagnoses, medications, labs, and vitals. Use when user requests: (1) patient records or clinical data, (2) EMR test data, (3) specific clinical cohorts like diabetes or heart failure, (4) HL7v2 or FHIR patient resources.
healthsim-membersim
MemberSim generates realistic synthetic claims and payer data for testing claims processing systems, payment integrity, and benefits administration.
healthsim-networksim
NetworkSim provides provider network intelligence using real NPPES data (8.9M providers). Use this skill for ANY request involving: (1) provider search by specialty or location, (2) facility search (hospitals, nursing homes, clinics), (3) NPI validation, (4) network adequacy assessment against CMS/NCQA standards, (5) healthcare desert identification, (6) provider density analysis, (7) network roster generation, (8) hospital or physician quality metrics, (9) cross-product provider assignment, (10) pharmacy network analysis.
healthsim-populationsim
PopulationSim provides population-level intelligence using public data sources. Use this skill for ANY request involving: (1) population demographics or profiles, (2) geographic health patterns or disparities, (3) social determinants of health (SDOH), (4) SVI or ADI analysis, (5) cohort definition or specification, (6) clinical trial feasibility, site selection, or enrollment projection, (7) service area analysis, (8) health equity assessment, (9) census data or ACS variables, (10) CDC PLACES health indicators.
Generative Framework
Conversation-driven specification and execution of healthcare data generation at scale
Didn't find tool you were looking for?