Agent skills
bio-genome-assembly-short-read...

Agent skill

bio-genome-assembly-short-read-assembly

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-genome-assembly-short-read-assembly

SKILL.md

name: bio-genome-assembly-short-read-assembly description: De novo genome assembly from Illumina short reads using SPAdes. Covers bacterial, fungal, and small eukaryotic genome assembly, as well as metagenome and transcriptome assembly modes. Use when assembling genomes from Illumina reads. tool_type: cli primary_tool: SPAdes measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

read_file
run_shell_command

Short-Read Assembly

Assemble genomes from Illumina paired-end or single-end reads using SPAdes.

SPAdes Overview

SPAdes (St. Petersburg genome Assembler) uses de Bruijn graph approach with multiple k-mer sizes for robust assembly.

Installation

bash

conda install -c bioconda spades

Basic Usage

Paired-End Assembly

bash

spades.py -1 R1.fastq.gz -2 R2.fastq.gz -o output_dir

Single-End Assembly

bash

spades.py -s reads.fastq.gz -o output_dir

With Unpaired Reads

bash

spades.py -1 R1.fastq.gz -2 R2.fastq.gz -s unpaired.fastq.gz -o output_dir

Assembly Modes

Isolate Mode (Default for Bacteria)

bash

spades.py --isolate -1 R1.fq.gz -2 R2.fq.gz -o isolate_assembly

Best for single-organism isolates with uniform coverage.

Careful Mode

bash

spades.py --careful -1 R1.fq.gz -2 R2.fq.gz -o careful_assembly

Reduces misassemblies at cost of speed. Recommended for small genomes.

Meta Mode (Metagenomes)

bash

spades.py --meta -1 R1.fq.gz -2 R2.fq.gz -o meta_assembly

For mixed microbial communities with varying coverage.

RNA Mode (Transcriptomes)

bash

spades.py --rna -1 R1.fq.gz -2 R2.fq.gz -o rna_assembly

Assembles transcripts from RNA-seq data.

Plasmid Mode

bash

spades.py --plasmid -1 R1.fq.gz -2 R2.fq.gz -o plasmid_assembly

Extracts plasmid sequences from bacterial isolates.

Key Options

Option	Description
`-o <dir>`	Output directory
`-t <#>`	Number of threads (default: 16)
`-m <#>`	Memory limit in GB (default: 250)
`-k <#,#,...>`	K-mer sizes (auto by default)
`--careful`	Reduce misassemblies
`--isolate`	Isolate mode for uniform coverage
`--meta`	Metagenome mode
`--rna`	RNA-seq assembly
`--cov-cutoff <#>`	Coverage cutoff (default: off)
`--only-assembler`	Skip error correction
`--continue`	Resume interrupted run

Multiple Libraries

Paired Libraries with Different Insert Sizes

bash

spades.py \
    --pe1-1 short_R1.fq.gz --pe1-2 short_R2.fq.gz \
    --pe2-1 long_R1.fq.gz --pe2-2 long_R2.fq.gz \
    -o output_dir

With Mate Pairs

bash

spades.py \
    --pe1-1 paired_R1.fq.gz --pe1-2 paired_R2.fq.gz \
    --mp1-1 mate_R1.fq.gz --mp1-2 mate_R2.fq.gz \
    -o output_dir

With PacBio/Nanopore (Hybrid)

bash

spades.py \
    -1 illumina_R1.fq.gz -2 illumina_R2.fq.gz \
    --pacbio pacbio.fq.gz \
    -o hybrid_assembly

# Or with Nanopore
spades.py \
    -1 illumina_R1.fq.gz -2 illumina_R2.fq.gz \
    --nanopore nanopore.fq.gz \
    -o hybrid_assembly

K-mer Selection

Auto Selection (Recommended)

SPAdes automatically selects appropriate k-mers based on read length.

Manual K-mer Specification

bash

# For 150bp reads
spades.py -k 21,33,55,77 -1 R1.fq.gz -2 R2.fq.gz -o output

# For 250bp reads
spades.py -k 21,33,55,77,99,127 -1 R1.fq.gz -2 R2.fq.gz -o output

Output Files

output_dir/
├── scaffolds.fasta     # Final scaffolds (use this)
├── contigs.fasta       # Contigs before scaffolding
├── assembly_graph.gfa  # Assembly graph
├── spades.log          # Log file
├── params.txt          # Parameters used
└── K*/                 # Intermediate k-mer assemblies

Scaffold FASTA Headers

>NODE_1_length_500000_cov_50.5

NODE_1 - Contig/scaffold ID
length_500000 - Sequence length
cov_50.5 - Average k-mer coverage

Memory and Performance

Reduce Memory Usage

bash

# Limit memory to 32GB
spades.py -m 32 -1 R1.fq.gz -2 R2.fq.gz -o output

# Use fewer threads
spades.py -t 8 -1 R1.fq.gz -2 R2.fq.gz -o output

Resume Interrupted Assembly

bash

spades.py --continue -o output_dir

Skip Error Correction

bash

# If reads already corrected
spades.py --only-assembler -1 R1.fq.gz -2 R2.fq.gz -o output

Complete Workflows

Bacterial Genome Assembly

bash

#!/bin/bash
set -euo pipefail

R1=$1
R2=$2
OUTDIR=$3
THREADS=${4:-16}

echo "=== Bacterial Genome Assembly ==="

# Run SPAdes in isolate mode
spades.py \
    --isolate \
    --careful \
    -t $THREADS \
    -1 $R1 -2 $R2 \
    -o $OUTDIR

# Basic stats
echo "Assembly statistics:"
grep -c "^>" ${OUTDIR}/scaffolds.fasta
seqkit stats ${OUTDIR}/scaffolds.fasta

Metagenome Assembly

bash

#!/bin/bash
set -euo pipefail

R1=$1
R2=$2
OUTDIR=$3

spades.py \
    --meta \
    -t 32 \
    -m 200 \
    -1 $R1 -2 $R2 \
    -o $OUTDIR

echo "Metagenome assembly complete: ${OUTDIR}/scaffolds.fasta"

Transcriptome Assembly

bash

spades.py \
    --rna \
    -t 16 \
    -1 rnaseq_R1.fq.gz -2 rnaseq_R2.fq.gz \
    -o transcriptome_assembly

Alternative Assemblers

Assembler	Best For
SPAdes	Small genomes, bacteria, fungi
MEGAHIT	Metagenomes (memory efficient)
ABySS	Large genomes
Velvet	Legacy, small genomes
Trinity	Transcriptomes

MEGAHIT (Alternative for Metagenomes)

bash

megahit -1 R1.fq.gz -2 R2.fq.gz -o megahit_output -t 16

Troubleshooting

Out of Memory

Reduce -m limit
Use --meta mode (more memory efficient)
Try MEGAHIT instead

Poor Assembly

Check read quality with FastQC
Trim adapters and low-quality bases
Increase coverage if possible
Try --careful mode

Long Runtime

Reduce k-mer values
Use --only-assembler if reads pre-corrected
Increase threads

Related Skills

read-qc - Preprocess reads before assembly
assembly-polishing - Polish assembly with Pilon
assembly-qc - Assess with QUAST/BUSCO
long-read-assembly - Long-read alternatives

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/bio-genome-assembly-short-read-assembly
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Short-Read Assembly

SPAdes Overview

Installation

Basic Usage

Paired-End Assembly

Single-End Assembly

With Unpaired Reads

Assembly Modes

Isolate Mode (Default for Bacteria)

Careful Mode

Meta Mode (Metagenomes)

RNA Mode (Transcriptomes)

Plasmid Mode

Key Options

Multiple Libraries

Paired Libraries with Different Insert Sizes

With Mate Pairs

With PacBio/Nanopore (Hybrid)

K-mer Selection

Auto Selection (Recommended)

Manual K-mer Specification

Output Files

Scaffold FASTA Headers

Memory and Performance

Reduce Memory Usage

Resume Interrupted Assembly

Skip Error Correction

Complete Workflows

Bacterial Genome Assembly

Metagenome Assembly

Transcriptome Assembly

Alternative Assemblers

MEGAHIT (Alternative for Metagenomes)

Troubleshooting

Out of Memory

Poor Assembly

Long Runtime

Related Skills

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations