Agent skill
scfoundation-model-agent
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/scfoundation-model-agent
SKILL.md
name: 'scfoundation-model-agent' description: 'Unified agent for leveraging single-cell foundation models (scGPT, scBERT, Geneformer, scFoundation) for cross-species annotation, perturbation prediction, and gene network inference.' measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
scFoundation Model Agent
The scFoundation Model Agent provides a unified interface to leverage state-of-the-art single-cell foundation models for diverse downstream tasks. It integrates scGPT, scBERT, Geneformer, scFoundation, and emerging models to enable cross-species cell annotation, in silico perturbation prediction, gene regulatory network inference, and batch integration.
When to Use This Skill
- When annotating cell types across species (human, mouse, cross-species).
- For predicting perturbation effects (knockouts, drug treatments) in silico.
- To infer gene regulatory networks from single-cell data.
- When integrating batches without losing biological signal.
- For generating cell embeddings for downstream analysis.
Core Capabilities
-
Cross-Species Cell Annotation: Transfer cell type labels across species using unified embeddings.
-
In Silico Perturbation: Predict gene expression changes from knockouts/treatments.
-
Gene Regulatory Network Inference: Discover TF-target relationships from attention patterns.
-
Batch Integration: Remove technical variation while preserving biology.
-
Cell Embedding Generation: Generate universal cell representations for any downstream task.
-
Multi-Model Ensemble: Combine predictions from multiple foundation models.
Supported Foundation Models
| Model | Parameters | Training Data | Strengths |
|---|---|---|---|
| scGPT | 50M | 33M human cells | General purpose, perturbations |
| Geneformer | 10M | 30M cells | Chromatin, gene networks |
| scBERT | 20M | 1.2M cells | Cell type annotation |
| scFoundation | 100M | 50M cells | Large-scale, multi-species |
| scTab | 15M | 22M cells | Tabular prediction |
| UCE (Universal Cell Embeddings) | 100M | 36M cells | Cross-species transfer |
Workflow
-
Input: Single-cell RNA-seq data (AnnData format).
-
Model Selection: Choose appropriate model(s) for task.
-
Preprocessing: Tokenize genes, normalize expression.
-
Inference: Generate embeddings or predictions.
-
Task Execution: Annotation, perturbation, or network inference.
-
Ensemble (Optional): Combine multi-model predictions.
-
Output: Annotated data, predictions, networks.
Example Usage
User: "Use scGPT to predict the effect of CRISPR knockout of TP53 on these cancer cells."
Agent Action:
python3 Skills/Genomics/scFoundation_Model_Agent/foundation_predict.py \
--input cancer_cells.h5ad \
--model scgpt \
--task perturbation \
--perturbation "TP53 knockout" \
--model_checkpoint scgpt_human_gene_v1.pt \
--output tp53_ko_predictions.h5ad
Task-Specific Usage
Cell Type Annotation
python3 foundation_predict.py \
--input query_cells.h5ad \
--model geneformer \
--task annotation \
--reference tabula_sapiens.h5ad \
--output annotated_cells.h5ad
Gene Network Inference
python3 foundation_predict.py \
--input cells.h5ad \
--model scgpt \
--task grn_inference \
--transcription_factors tf_list.txt \
--output gene_network.csv
Batch Integration
python3 foundation_predict.py \
--input multi_batch.h5ad \
--model scfoundation \
--task integration \
--batch_key batch \
--output integrated.h5ad
Output Formats
| Task | Output | Format |
|---|---|---|
| Annotation | Cell type labels | .h5ad obs column |
| Perturbation | Predicted expression | .h5ad layer |
| GRN | TF-target edges | .csv, .graphml |
| Integration | Corrected embeddings | .h5ad obsm |
| Embeddings | Cell representations | .h5ad obsm |
Performance Benchmarks
| Task | Model | Dataset | Performance |
|---|---|---|---|
| Annotation | scGPT | Tabula Sapiens | 93% accuracy |
| Annotation | Geneformer | HLCA | 91% accuracy |
| Perturbation (R²) | scGPT | Norman 2019 | 0.87 |
| Integration (kBET) | scFoundation | Multi-atlas | 0.92 |
| Cross-species | UCE | Human→Mouse | 85% F1 |
AI/ML Architecture
Transformer Backbone:
- Gene-level tokenization
- Attention-based gene interactions
- Masked expression prediction pretraining
Perturbation Module:
- Conditional generation
- Counterfactual prediction
- Dose-response modeling
Transfer Learning:
- Zero-shot annotation
- Few-shot fine-tuning
- Domain adaptation
Prerequisites
- Python 3.10+
- PyTorch 2.0+
- transformers, flash-attn
- Scanpy, AnnData
- Model-specific weights
- GPU with 16GB+ VRAM
Related Skills
- Nicheformer_Spatial_Agent - For spatial foundation models
- scGPT_Agent - Dedicated scGPT workflows
- Cell_Type_Annotation - Traditional annotation methods
- Pathway_Analysis - Gene set enrichment
Model Selection Guide
| Use Case | Recommended Model | Reason |
|---|---|---|
| General annotation | scGPT | Broad training, robust |
| Cross-species | UCE | Species-agnostic embeddings |
| Perturbation | scGPT | Best perturbation performance |
| GRN inference | Geneformer | Attention → regulatory links |
| Large-scale | scFoundation | Efficient, scalable |
| Tabular prediction | scTab | Optimized for classification |
Special Considerations
- Gene Coverage: Models trained on variable gene sets; check overlap
- Species: Some models human-only; use UCE for cross-species
- Compute: Large models need significant GPU memory
- Fine-Tuning: Task-specific fine-tuning improves performance
- Versioning: Model weights update frequently; track versions
Ensemble Strategies
| Strategy | Method | Benefit |
|---|---|---|
| Majority Vote | Mode of predictions | Robust to outliers |
| Weighted Average | Confidence-weighted | Leverages uncertainty |
| Stacking | Meta-model | Learns model strengths |
| Attention Fusion | Cross-model attention | Deep integration |
Author
AI Group - Biomedical AI Platform
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?