Agent skill

bulk-rna-seq-differential-expression-with-omicverse

Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse.

Stars 2,009
Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bulk-deg-analysis

SKILL.md

Bulk RNA-seq differential expression with omicverse

Overview

Follow this skill to run the end-to-end differential expression (DEG) workflow showcased in t_deg.ipynb. It assumes the user provides a raw gene-level count matrix (e.g., from featureCounts) and wants to analyse bulk RNA-seq cohorts inside omicverse.

Instructions

  1. Set up the session
    • Import omicverse as ov, scanpy as sc, and matplotlib.pyplot as plt.
    • Call ov.plot_set() so downstream plots adopt omicverse styling.
  2. Prepare ID mapping assets
    • When gene IDs must be converted to gene symbols, instruct the user to download mapping pairs via ov.utils.download_geneid_annotation_pair() and store them under genesets/.
    • Mention the available prebuilt genomes (T2T-CHM13, GRCh38, GRCh37, GRCm39, danRer7, danRer11) and that users can generate their own mapping from GTF files if needed.
  3. Load the raw counts
    • Read tab-delimited featureCounts output with ov.pd.read_csv(..., sep='\t', header=1, index_col=0).
    • Strip trailing .bam segments from column names using list comprehension so sample IDs are clean.
  4. Map gene identifiers
    • Run ov.bulk.Matrix_ID_mapping(counts_df, 'genesets/pair_<GENOME>.tsv') to replace gene_id entries with gene symbols.
  5. Initialise the DEG object
    • Create dds = ov.bulk.pyDEG(mapped_counts).
    • Handle duplicate gene symbols with dds.drop_duplicates_index() to keep the highest expressed version.
  6. Normalise and estimate size factors
    • Execute dds.normalize() to calculate DESeq2 size factors, correcting for library size and batch differences.
  7. Run differential testing
    • Collect treatment and control replicate labels into lists.
    • Call dds.deg_analysis(treatment_groups, control_groups, method='ttest') for the default Welch t-test.
    • Offer optional alternatives: method='edgepy' for edgeR-like tests and method='limma' for limma-style modelling.
  8. Filter and threshold results
    • Note that lowly expressed genes are retained by default; filter using dds.result.loc[dds.result['log2(BaseMean)'] > 1] when needed.
    • Set dynamic fold-change and significance cutoffs via dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=6) (fc_threshold=-1 auto-selects based on log2FC distribution).
  9. Visualise differential expression
    • Produce volcano plots with dds.plot_volcano(title=..., figsize=..., plot_genes=... or plot_genes_num=...) to highlight key genes.
    • Generate per-gene boxplots using dds.plot_boxplot(genes=[...], treatment_groups=..., control_groups=..., figsize=..., legend_bbox=...); adjust y-axis tick labels if required.
  10. Perform pathway enrichment (optional)
    • Download curated pathway libraries through ov.utils.download_pathway_database().
    • Load genesets with ov.utils.geneset_prepare(<path>, organism='Mouse'|'Human'|...).
    • Build the DEG gene list from dds.result.loc[dds.result['sig'] != 'normal'].index.
    • Run enrichment with ov.bulk.geneset_enrichment(gene_list=deg_genes, pathways_dict=..., pvalue_type='auto', organism=...). Encourage users without internet access to provide a background gene list.
    • Visualise single-library results via ov.bulk.geneset_plot(...) and combine multiple ontologies using ov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=...).
  11. Document outputs
    • Suggest exporting dds.result and enrichment tables to CSV for downstream reporting.
    • Encourage users to save figures generated by matplotlib (plt.savefig(...)) when running outside notebooks.
  12. Troubleshooting tips
    • Ensure sample labels in treatment_groups/control_groups exactly match column names post-cleanup.
    • Verify required packages (omicverse, pyComplexHeatmap, gseapy) are installed for enrichment visualisations.
    • Remind users that internet access is required the first time they download gene mappings or pathway databases.

Examples

  • "I have a featureCounts matrix for mouse tumour samples—normalize it with DESeq2, run t-test DEG, and highlight the top 8 genes in a volcano plot."
  • "Use omicverse to compute edgeR-style differential expression between treated and control replicates, then run GO enrichment on significant genes."
  • "Guide me through converting Ensembl IDs to symbols, performing limma DEG, and plotting boxplots for Krtap9-5 and Lef1."

References

  • Detailed walkthrough notebook: t_deg.ipynb
  • Sample count matrix for testing: sample/counts.txt
  • Quick copy/paste commands: reference.md

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results