Agent skill
bio-genome-assembly-long-read-assembly
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-genome-assembly-long-read-assembly
SKILL.md
name: bio-genome-assembly-long-read-assembly description: De novo genome assembly from Oxford Nanopore or PacBio long reads using Flye and Canu. Produces highly contiguous assemblies suitable for complete bacterial genomes and resolving complex regions. Use when assembling genomes from ONT or PacBio reads. tool_type: cli primary_tool: Flye measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Long-Read Assembly
Assemble genomes from Oxford Nanopore (ONT) or PacBio long reads for highly contiguous assemblies.
Tool Comparison
| Tool | Speed | Memory | Best For |
|---|---|---|---|
| Flye | Fast | Moderate | General purpose, bacteria, ONT |
| Canu | Slow | High | High accuracy, complex genomes |
| Wtdbg2 | Very fast | Low | Draft assemblies |
Note: For PacBio HiFi data, see the dedicated hifi-assembly skill which covers hifiasm.
Flye
Installation
conda install -c bioconda flye
Basic Usage
# Oxford Nanopore
flye --nano-raw reads.fastq.gz --out-dir flye_output --threads 16
# PacBio CLR
flye --pacbio-raw reads.fastq.gz --out-dir flye_output --threads 16
# PacBio HiFi
flye --pacbio-hifi reads.fastq.gz --out-dir flye_output --threads 16
Read Type Options
| Option | Read Type |
|---|---|
--nano-raw |
ONT regular reads |
--nano-corr |
ONT corrected reads |
--nano-hq |
ONT Q20+ reads (Guppy 5+) |
--pacbio-raw |
PacBio CLR |
--pacbio-corr |
PacBio corrected |
--pacbio-hifi |
PacBio HiFi/CCS |
Key Options
| Option | Description |
|---|---|
--out-dir |
Output directory |
--threads |
Number of threads |
--genome-size |
Estimated genome size (e.g., 5m, 100m) |
--iterations |
Polishing iterations (default: 1) |
--meta |
Metagenome mode |
--plasmids |
Recover plasmids |
--keep-haplotypes |
Don't collapse haplotypes |
--scaffold |
Enable scaffolding |
Genome Size Estimation
# Estimate if unknown
flye --nano-raw reads.fq.gz --out-dir output --genome-size 5m
# Size formats: 1000, 1k, 1m, 1g
Output Files
flye_output/
├── assembly.fasta # Final assembly
├── assembly_graph.gfa # Assembly graph
├── assembly_info.txt # Contig statistics
└── flye.log # Log file
Bacterial Assembly
flye \
--nano-raw bacteria.fastq.gz \
--out-dir bacteria_assembly \
--genome-size 5m \
--threads 16
Metagenome Assembly
flye \
--nano-raw metagenome.fastq.gz \
--out-dir meta_assembly \
--meta \
--threads 32
With Plasmid Recovery
flye \
--nano-raw isolate.fastq.gz \
--out-dir assembly \
--plasmids \
--threads 16
Canu
Installation
conda install -c bioconda canu
Basic Usage
# ONT reads
canu -p assembly -d canu_output genomeSize=5m -nanopore reads.fastq.gz
# PacBio HiFi
canu -p assembly -d canu_output genomeSize=5m -pacbio-hifi reads.fastq.gz
Key Options
| Option | Description |
|---|---|
-p |
Assembly prefix |
-d |
Output directory |
genomeSize= |
Estimated size (required) |
maxThreads= |
Max threads |
maxMemory= |
Max memory (e.g., 64g) |
useGrid=false |
Disable grid execution |
correctedErrorRate= |
Expected error rate |
Read Type Options
| Option | Read Type |
|---|---|
-nanopore |
ONT reads |
-nanopore-raw |
ONT raw (deprecated) |
-pacbio |
PacBio CLR |
-pacbio-hifi |
PacBio HiFi/CCS |
Fast Mode
canu -p asm -d output genomeSize=5m \
-nanopore reads.fq.gz \
useGrid=false \
maxThreads=16 \
maxMemory=32g
High-Quality Mode (PacBio HiFi)
canu -p asm -d output genomeSize=5m \
-pacbio-hifi reads.fq.gz \
correctedErrorRate=0.01
Output Files
canu_output/
├── assembly.contigs.fasta # Contigs
├── assembly.unassembled.fasta
├── assembly.report
└── assembly.seqStore/
Wtdbg2 (Fast Draft)
Installation
conda install -c bioconda wtdbg
Basic Usage
# Assemble
wtdbg2 -x ont -g 5m -t 16 -i reads.fq.gz -o draft
# Consensus
wtpoa-cns -t 16 -i draft.ctg.lay.gz -o draft.ctg.fa
Platform Presets
| Preset | Platform |
|---|---|
-x ont |
ONT R9 |
-x ccs |
PacBio HiFi |
-x rs |
PacBio CLR |
-x sq |
ONT R10 |
Complete Workflows
ONT Bacterial Assembly
#!/bin/bash
set -euo pipefail
READS=$1
OUTDIR=$2
SIZE=${3:-5m}
echo "=== ONT Bacterial Assembly ==="
# Flye assembly
flye \
--nano-raw $READS \
--out-dir ${OUTDIR}/flye \
--genome-size $SIZE \
--threads 16
# Stats
echo "Assembly statistics:"
cat ${OUTDIR}/flye/assembly_info.txt
echo "Assembly: ${OUTDIR}/flye/assembly.fasta"
Hybrid Assembly (Long + Short)
#!/bin/bash
set -euo pipefail
LONG=$1
SHORT_R1=$2
SHORT_R2=$3
OUTDIR=$4
# 1. Long-read assembly with Flye
flye --nano-raw $LONG --out-dir ${OUTDIR}/flye --genome-size 5m --threads 16
# 2. Polish with short reads (Pilon)
# See assembly-polishing skill
Quality Expectations
| Metric | Bacterial | Eukaryotic |
|---|---|---|
| Contigs | 1-10 | 100-1000+ |
| N50 | >1 Mb | Variable |
| Complete chromosomes | Often | Rare |
Troubleshooting
Low Contiguity
- Check coverage (need >30x)
- Try increasing iterations in Flye
- Consider supplementing with short reads
Memory Issues
- Use Flye (more memory efficient)
- Reduce threads
- Filter reads by length/quality
Misassemblies
- Polish with Pilon/medaka
- Validate with short reads
- Check for contamination
Related Skills
- hifi-assembly - PacBio HiFi assembly with hifiasm
- assembly-polishing - Polish long-read assemblies
- assembly-qc - QUAST and BUSCO assessment
- short-read-assembly - Hybrid with Illumina
- long-read-sequencing - Read QC and alignment
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?