Agent skill
context-ingestion
Scan project folder structure, validate organization, clone GitHub repository, and generate an inventory of available materials. First step of biomedical-science-writer workflow. Use when starting a new manuscript project.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/context-ingestion-sxg-biomedical-science-w
SKILL.md
Context Ingestion
Scans the project folder, validates structure, fetches the GitHub repository, and generates an inventory of all available materials.
Input
User provides path to project folder (or current directory if already there).
Workflow
[Receive project path]
│
▼
[Validate Folder Structure] ─── Check required folders exist
│
▼
[Parse config.md] ─── Extract GitHub URL, constraints
│
▼
[Clone GitHub Repository] ─── Fetch code for analysis
│
▼
[Inventory Materials] ─── List all available files
│
▼
[Extract IRB Content] ─── If irb/ exists, generate notes/irb-summary.md
│
▼
[Generate inventory.md] ─── Structured summary
Step 1: Validate Folder Structure
Check that required folders exist:
# Required structure
project/
├── papers/ # Must exist (can be empty)
├── data/ # Must exist (can be empty)
├── figures/ # Must exist (can be empty)
├── irb/ # Optional - IRB protocol and regulatory documents
└── config.md # Must exist
Validation:
cd /path/to/project
# Check required folders
[ -d "papers" ] || echo "ERROR: papers/ folder missing"
[ -d "data" ] || echo "ERROR: data/ folder missing"
[ -d "figures" ] || echo "ERROR: figures/ folder missing"
[ -f "config.md" ] || echo "ERROR: config.md missing"
If validation fails, inform user what's missing and provide the expected structure template.
Step 2: Parse config.md
Extract configuration values:
# Expected config.md format
## GitHub Repository
url: https://github.com/username/repo-name
branch: main
access: private
## Constraints
word_limit: 3500
target_journal: Radiology: Artificial Intelligence
citation_style: AMA
## Additional Notes
[Free text notes]
Parse and store:
github_url: Repository URLgithub_branch: Branch to clone (default: main)github_access: public or privateword_limit: Target word counttarget_journal: Journal name for formattingcitation_style: AMA, Vancouver, APA, etc.
Step 3: Clone GitHub Repository
For public repositories:
git clone --depth 1 --branch main https://github.com/username/repo-name.git code/
For private repositories, user must have GitHub CLI authenticated:
gh repo clone username/repo-name code/ -- --depth 1 --branch main
If clone fails:
- Check if
ghis authenticated:gh auth status - Provide instructions: "Run
gh auth loginto authenticate" - Allow user to proceed without code (Methods section will be limited)
Store cloned repo at: project/code/
Step 4: Inventory Materials
Scan each folder and catalog contents:
Papers Inventory
ls -la papers/*.pdf 2>/dev/null | wc -l # Count PDFs
For each PDF, extract basic info:
- Filename
- File size
- (Attempt to extract title from first page if possible)
Data Inventory
ls -la data/*.csv data/*.xlsx 2>/dev/null
For each data file:
- Filename
- File size
- Row/column count (for CSVs)
- Sheet names (for Excel)
Preview CSV structure:
head -5 data/results.csv
Figures Inventory
ls -la figures/*.png figures/*.jpg figures/*.svg 2>/dev/null
For each figure:
- Filename
- Dimensions (if determinable)
- File size
Code Inventory
If GitHub clone succeeded:
find code/ -name "*.py" -o -name "*.ipynb" -o -name "*.R" | head -20
Identify:
- Primary language (Python, R, etc.)
- Notebook files (.ipynb)
- Key script files
- Requirements/dependencies file
IRB Inventory (Optional)
If irb/ folder exists, scan for regulatory documents:
ls -la irb/*.pdf irb/*.docx irb/*.md 2>/dev/null
Supported formats:
.md- Read directly with Read tool.pdf- Read with Claude's native PDF capability.docx- Extract text usingdocument-skills:docxskill
Step 5: Extract IRB Content
Skip this step if irb/ folder does not exist or is empty.
For each document in irb/:
- Read the document content using appropriate method for format
- Extract comprehensive study information
- Generate
notes/irb-summary.md
IRB Summary Template
Create notes/irb-summary.md:
# IRB Document Summary
**Source**: [filename]
**Extracted**: [timestamp]
## Study Identification
- **Protocol Title**: [extracted or "[not found]"]
- **IRB Approval Number**: [extracted or "[not found]"]
- **Principal Investigator**: [extracted or "[not found]"]
- **Approval Date**: [extracted or "[not found]"]
## Study Design
- **Study Type**: [interventional/observational/retrospective/etc.]
- **Design**: [RCT, cohort, case-control, cross-sectional, etc.]
- **Duration**: [study period]
## Population
- **Target Population**: [description]
- **Inclusion Criteria**:
- [criterion 1]
- [criterion 2]
- ...
- **Exclusion Criteria**:
- [criterion 1]
- [criterion 2]
- ...
- **Sample Size**: [N with justification if provided]
## Procedures & Interventions
- [Procedure 1]
- [Procedure 2]
- ...
## Endpoints
- **Primary**: [endpoint]
- **Secondary**: [endpoints]
## Statistical Considerations
- **Power Analysis**: [if provided or "[not found]"]
- **Planned Analyses**: [if provided or "[not found]"]
## Notes
[Any additional relevant context, caveats, or sections that were unclear]
Mark fields as [not found] if not present in the document.
Step 6: Generate inventory.md
Create structured inventory document:
# Project Inventory
Generated: [timestamp]
Project: [folder name]
## Configuration
- **GitHub**: [url] (branch: [branch])
- **Target Journal**: [journal]
- **Word Limit**: [limit]
- **Citation Style**: [style]
## Papers ([count] files)
| Filename | Size | Notes |
|----------|------|-------|
| smith-2023.pdf | 1.2 MB | |
| jones-2022.pdf | 0.8 MB | |
## Data ([count] files)
| Filename | Size | Rows | Columns | Preview |
|----------|------|------|---------|---------|
| results.csv | 45 KB | 156 | 12 | patient_id, age, sex, ... |
| demographics.csv | 12 KB | 156 | 8 | patient_id, age, sex, ... |
## Figures ([count] files)
| Filename | Dimensions | Size |
|----------|------------|------|
| figure1.png | 1200x800 | 340 KB |
| figure2.png | 1000x600 | 210 KB |
## Code Repository
- **URL**: [github url]
- **Language**: Python
- **Key Files**:
- `analysis.ipynb` - Main analysis notebook
- `preprocessing.py` - Data preprocessing
- `models.py` - ML models
- **Dependencies**: pandas, scikit-learn, matplotlib, ...
## IRB Documents
| Filename | Format | Status |
|----------|--------|--------|
| protocol.pdf | PDF | ✓ Extracted to notes/irb-summary.md |
*Or: "No IRB documents provided"*
## Summary
| Category | Count | Status |
|----------|-------|--------|
| Papers | [n] | ✓ Ready |
| Data files | [n] | ✓ Ready |
| Figures | [n] | ✓ Ready |
| Code repo | 1 | ✓ Cloned |
| IRB documents | [n] | ✓ Extracted / Not provided |
## Missing/Warnings
- [List any issues found]
Output
Save to: project/inventory.md
Create notes directory structure:
mkdir -p notes/papers notes/search notes/references notes/papers-library drafts
Return to parent skill with inventory summary.
Didn't find tool you were looking for?