Agent skill
single-cell-multi-omics-integration
Quick-reference sheet for OmicVerse tutorials spanning MOFA, GLUE pairing, SIMBA integration, TOSICA transfer, and StaVIA cartography.
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/single-multiomics
SKILL.md
Single-Cell Multi-Omics Tutorials Cheat Sheet
This skill walk-through summarizes the OmicVerse notebooks that cover paired and unpaired multi-omic integration, multi-batch embedding, reference transfer, and trajectory cartography.
MOFA on paired scRNA + scATAC (t_mofa.ipynb)
- Data preparation: Load preprocessed AnnData objects for RNA (
rna_p_n_raw.h5ad) and ATAC (atac_p_n_raw.h5ad) withov.utils.read, and initialisepyMOFAwith matchingomicsandomics_namelists. - Model training: Call
mofa_preprocess()to select highly variable features and run the factor model withmofa_run(outfile=...), which exports the learned MOFA+ factors to an HDF5 model file. - Result inspection: Reload downstream AnnData, append factor scores via
ov.single.factor_exact, and explore factor–cluster associations usingfactor_correlation,get_weights, and the plotting helpers inpyMOFAART(plot_r2,plot_cor,plot_factor,plot_weights, etc.). - Export workflow: Persist factors and weights through the MOFA HDF5 artifact and reuse them by instantiating
pyMOFAART(model_path=...)for later annotation or visualisation sessions. - Dependencies & hardware: Requires
mofapy2; plots optionally rely onpymde/scvi-toolsbut run on CPU.
MOFA after GLUE pairing (t_mofa_glue.ipynb)
- Data preparation: Start from GLUE-derived embeddings (
rna-emb.h5ad,atac.emb.h5ad), build aGLUE_pairobject, and runcorrelation()to align unpaired cells before subsetting to highly variable features. - Model training: Instantiate
pyMOFAwith the aligned AnnData objects, runmofa_preprocess(), and save the joint factors throughmofa_run(outfile='models/chen_rna_atac.hdf5'). - Result inspection: Use
pyMOFAARTplus AnnData that now contains the GLUE embeddings to compute factors (get_factors) and visualise variance explained, factor–cluster correlations, and ranked feature weights. - Export workflow: Reuse the saved MOFA HDF5 model for downstream inspection; GLUE embeddings can be embedded with
scvi.model.utils.mde(GPU-accelerated MDE is optional,sc.tl.umapworks on CPU). - Dependencies & hardware: Requires both
mofapy2and the GLUE tooling (scglue,scvi-tools,pymde); GPU acceleration only affects optional MDE visualisation.
SIMBA batch integration (t_simba.ipynb)
- Data preparation: Fetch the concatenated AnnData (
simba_adata_raw.h5ad) derived from multiple pancreas studies and pass it, alongside a results directory, topySIMBA. - Model training: Execute
preprocess(...)to bin features and build a SIMBA-compatible graph, then callgen_graph()followed bytrain(num_workers=...)to launch PyTorch-BigGraph optimisation (can scale with CPU workers) andload(...)to resume trained checkpoints. - Result inspection: Apply
batch_correction()to obtain the harmonised AnnData with SIMBA embeddings (X_simba) and visualise usingmde/sc.tl.umapcoloured by cell type or batch. - Export workflow: Training outputs reside in the workdir (e.g.,
result_human_pancreas/pbg/graph0); reuse them withsimba_object.load(...)for later analyses. - Dependencies & hardware: Requires installing
simbaandsimba_pbg(PyTorch BigGraph backend). GPU is optional; make sure adequate CPU threads and memory are available for graph training.
TOSICA reference transfer (t_tosica.ipynb)
- Data preparation: Download demo AnnData references (
demo_train.h5ad,demo_test.h5ad) and required gene-set GMT files viaov.utils.download_tosica_gmt(); confirm datasets are log-normalised before training. - Model training: Create
pyTOSICAwith the reference AnnData, chosen pathway mask, label key, project directory, and batch size; train withtrain(epochs=...), then persist weights withsave()and optionally reload viaload(). - Result inspection: Generate predictions on query AnnData through
predicted(pre_adata=...), embed with OmicVerse preprocessing and GPU-enabledmde(UMAP fallback available), and explore pathway attention to interpret transformer heads. - Export workflow: Saved project folder keeps model checkpoints and attention summaries; reuse the exported assets to annotate future datasets without retraining from scratch.
- Dependencies & hardware: Needs TOSICA (PyTorch transformer) plus downloaded gene-set masks; avoid setting
depth=2if memory is constrained. GPU acceleration improves embedding (mde) but training runs on standard PyTorch (CPU/GPU depending on environment).
StaVIA trajectory cartography (t_stavia.ipynb)
- Data preparation: Load example dentate gyrus velocity data via
scvelo.datasets.dentategyrus(), preprocess with OmicVerse (preprocess,scale,pca, neighbours, UMAP) to populate the AnnData matrices used by VIA. - Model training: Configure VIA hyperparameters (components, neighbours, seeds, root selection) and instantiate/run
VIA.core.VIAon the chosen representation (adata.obsm['scaled|original|X_pca']). - Result inspection: Store outputs such as pseudotime (
single_cell_pt_markov), cluster graph abstractions, trajectory curves, atlas views, and stream plots through VIA plotting helpers. - Export workflow: Persist derived visualisations and animations (e.g.,
animate_streamplot_ov,animate_atlas) to files (.gif) for reporting; recompute edge bundles viamake_edgebundle_milestonewhen needed. - Dependencies & hardware: Relies on
scvelo,pyVIA, and OmicVerse plotting; computations are CPU-bound though producing large stream/animation outputs benefits from ample memory.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?