Agent skill
bio-genome-assembly-short-read-assembly
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-genome-assembly-short-read-assembly
SKILL.md
name: bio-genome-assembly-short-read-assembly description: De novo genome assembly from Illumina short reads using SPAdes. Covers bacterial, fungal, and small eukaryotic genome assembly, as well as metagenome and transcriptome assembly modes. Use when assembling genomes from Illumina reads. tool_type: cli primary_tool: SPAdes measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Short-Read Assembly
Assemble genomes from Illumina paired-end or single-end reads using SPAdes.
SPAdes Overview
SPAdes (St. Petersburg genome Assembler) uses de Bruijn graph approach with multiple k-mer sizes for robust assembly.
Installation
conda install -c bioconda spades
Basic Usage
Paired-End Assembly
spades.py -1 R1.fastq.gz -2 R2.fastq.gz -o output_dir
Single-End Assembly
spades.py -s reads.fastq.gz -o output_dir
With Unpaired Reads
spades.py -1 R1.fastq.gz -2 R2.fastq.gz -s unpaired.fastq.gz -o output_dir
Assembly Modes
Isolate Mode (Default for Bacteria)
spades.py --isolate -1 R1.fq.gz -2 R2.fq.gz -o isolate_assembly
Best for single-organism isolates with uniform coverage.
Careful Mode
spades.py --careful -1 R1.fq.gz -2 R2.fq.gz -o careful_assembly
Reduces misassemblies at cost of speed. Recommended for small genomes.
Meta Mode (Metagenomes)
spades.py --meta -1 R1.fq.gz -2 R2.fq.gz -o meta_assembly
For mixed microbial communities with varying coverage.
RNA Mode (Transcriptomes)
spades.py --rna -1 R1.fq.gz -2 R2.fq.gz -o rna_assembly
Assembles transcripts from RNA-seq data.
Plasmid Mode
spades.py --plasmid -1 R1.fq.gz -2 R2.fq.gz -o plasmid_assembly
Extracts plasmid sequences from bacterial isolates.
Key Options
| Option | Description |
|---|---|
-o <dir> |
Output directory |
-t <#> |
Number of threads (default: 16) |
-m <#> |
Memory limit in GB (default: 250) |
-k <#,#,...> |
K-mer sizes (auto by default) |
--careful |
Reduce misassemblies |
--isolate |
Isolate mode for uniform coverage |
--meta |
Metagenome mode |
--rna |
RNA-seq assembly |
--cov-cutoff <#> |
Coverage cutoff (default: off) |
--only-assembler |
Skip error correction |
--continue |
Resume interrupted run |
Multiple Libraries
Paired Libraries with Different Insert Sizes
spades.py \
--pe1-1 short_R1.fq.gz --pe1-2 short_R2.fq.gz \
--pe2-1 long_R1.fq.gz --pe2-2 long_R2.fq.gz \
-o output_dir
With Mate Pairs
spades.py \
--pe1-1 paired_R1.fq.gz --pe1-2 paired_R2.fq.gz \
--mp1-1 mate_R1.fq.gz --mp1-2 mate_R2.fq.gz \
-o output_dir
With PacBio/Nanopore (Hybrid)
spades.py \
-1 illumina_R1.fq.gz -2 illumina_R2.fq.gz \
--pacbio pacbio.fq.gz \
-o hybrid_assembly
# Or with Nanopore
spades.py \
-1 illumina_R1.fq.gz -2 illumina_R2.fq.gz \
--nanopore nanopore.fq.gz \
-o hybrid_assembly
K-mer Selection
Auto Selection (Recommended)
SPAdes automatically selects appropriate k-mers based on read length.
Manual K-mer Specification
# For 150bp reads
spades.py -k 21,33,55,77 -1 R1.fq.gz -2 R2.fq.gz -o output
# For 250bp reads
spades.py -k 21,33,55,77,99,127 -1 R1.fq.gz -2 R2.fq.gz -o output
Output Files
output_dir/
├── scaffolds.fasta # Final scaffolds (use this)
├── contigs.fasta # Contigs before scaffolding
├── assembly_graph.gfa # Assembly graph
├── spades.log # Log file
├── params.txt # Parameters used
└── K*/ # Intermediate k-mer assemblies
Scaffold FASTA Headers
>NODE_1_length_500000_cov_50.5
NODE_1- Contig/scaffold IDlength_500000- Sequence lengthcov_50.5- Average k-mer coverage
Memory and Performance
Reduce Memory Usage
# Limit memory to 32GB
spades.py -m 32 -1 R1.fq.gz -2 R2.fq.gz -o output
# Use fewer threads
spades.py -t 8 -1 R1.fq.gz -2 R2.fq.gz -o output
Resume Interrupted Assembly
spades.py --continue -o output_dir
Skip Error Correction
# If reads already corrected
spades.py --only-assembler -1 R1.fq.gz -2 R2.fq.gz -o output
Complete Workflows
Bacterial Genome Assembly
#!/bin/bash
set -euo pipefail
R1=$1
R2=$2
OUTDIR=$3
THREADS=${4:-16}
echo "=== Bacterial Genome Assembly ==="
# Run SPAdes in isolate mode
spades.py \
--isolate \
--careful \
-t $THREADS \
-1 $R1 -2 $R2 \
-o $OUTDIR
# Basic stats
echo "Assembly statistics:"
grep -c "^>" ${OUTDIR}/scaffolds.fasta
seqkit stats ${OUTDIR}/scaffolds.fasta
Metagenome Assembly
#!/bin/bash
set -euo pipefail
R1=$1
R2=$2
OUTDIR=$3
spades.py \
--meta \
-t 32 \
-m 200 \
-1 $R1 -2 $R2 \
-o $OUTDIR
echo "Metagenome assembly complete: ${OUTDIR}/scaffolds.fasta"
Transcriptome Assembly
spades.py \
--rna \
-t 16 \
-1 rnaseq_R1.fq.gz -2 rnaseq_R2.fq.gz \
-o transcriptome_assembly
Alternative Assemblers
| Assembler | Best For |
|---|---|
| SPAdes | Small genomes, bacteria, fungi |
| MEGAHIT | Metagenomes (memory efficient) |
| ABySS | Large genomes |
| Velvet | Legacy, small genomes |
| Trinity | Transcriptomes |
MEGAHIT (Alternative for Metagenomes)
megahit -1 R1.fq.gz -2 R2.fq.gz -o megahit_output -t 16
Troubleshooting
Out of Memory
- Reduce
-mlimit - Use
--metamode (more memory efficient) - Try MEGAHIT instead
Poor Assembly
- Check read quality with FastQC
- Trim adapters and low-quality bases
- Increase coverage if possible
- Try
--carefulmode
Long Runtime
- Reduce k-mer values
- Use
--only-assemblerif reads pre-corrected - Increase threads
Related Skills
- read-qc - Preprocess reads before assembly
- assembly-polishing - Polish assembly with Pilon
- assembly-qc - Assess with QUAST/BUSCO
- long-read-assembly - Long-read alternatives
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?