Agent skill
bio-basecalling
Convert raw Nanopore signal data (FAST5/POD5) to nucleotide sequences using Dorado basecaller. Covers model selection, GPU acceleration, modified base detection, and quality filtering. Use when processing raw Nanopore data before alignment. Guppy is deprecated; use Dorado for all new analyses.
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-basecalling
SKILL.md
Version Compatibility
Reference examples tested with: samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- CLI:
<tool> --versionthen<tool> --helpto confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Nanopore Basecalling
"Basecall my Nanopore data" → Convert raw electrical signal (FAST5/POD5) into nucleotide sequences with quality scores, optionally detecting modified bases.
- CLI:
dorado basecaller sup pod5/ > calls.bam(recommended),dorado basecaller sup,5mCG_5hmCG pod5/(with modifications)
Convert raw electrical signal from Nanopore sequencing into nucleotide sequences.
Dorado (Recommended)
Dorado is ONT's current production basecaller, replacing Guppy. It offers better accuracy and speed.
Basic Basecalling
dorado basecaller sup pod5_dir/ > calls.bam
Choose Model
dorado basecaller fast pod5_dir/ > calls.bam
dorado basecaller hac pod5_dir/ > calls.bam
dorado basecaller sup pod5_dir/ > calls.bam
Model Speed vs Accuracy
| Model | Speed | Accuracy | Use Case |
|---|---|---|---|
| fast | Fastest | Lower | Quick preview |
| hac | Medium | High | General use |
| sup | Slowest | Highest | Publication quality |
Specific Model Version
dorado download --model dna_r10.4.1_e8.2_400bps_sup@v5.1.0
dorado basecaller dna_r10.4.1_e8.2_400bps_sup@v5.1.0 pod5_dir/ > calls.bam
List Available Models
dorado download --list
Output FASTQ Instead of BAM
dorado basecaller sup pod5_dir/ --emit-fastq > calls.fastq
Modified Base Detection
dorado basecaller sup,5mCG_5hmCG pod5_dir/ > calls_mods.bam
dorado basecaller sup,5mCG pod5_dir/ > calls_5mc.bam
dorado basecaller sup,6mA pod5_dir/ > calls_6ma.bam
GPU Selection
dorado basecaller sup pod5_dir/ --device cuda:0 > calls.bam
dorado basecaller sup pod5_dir/ --device cuda:0,1 > calls.bam
dorado basecaller sup pod5_dir/ --device cpu > calls.bam
Batch Size for Memory
dorado basecaller sup pod5_dir/ --batchsize 64 > calls.bam
Duplex Calling
dorado duplex sup pod5_dir/ > duplex.bam
Demultiplexing During Basecalling
dorado basecaller sup pod5_dir/ --kit-name SQK-NBD114-24 > calls.bam
dorado demux calls.bam --output-dir demuxed/ --kit-name SQK-NBD114-24
Trim Adapters
dorado basecaller sup pod5_dir/ --trim adapters > calls.bam
dorado basecaller sup pod5_dir/ --no-trim > calls_untrimmed.bam
Resume Interrupted Run
dorado basecaller sup pod5_dir/ --resume-from calls.bam > calls_complete.bam
Guppy (Deprecated - Legacy Only)
Guppy is deprecated and no longer receiving updates. Use Dorado for all new analyses. Guppy examples below are only for maintaining legacy pipelines.
Basic Basecalling
guppy_basecaller \
-i fast5_dir/ \
-s output_dir/ \
-c dna_r10.4.1_e8.2_400bps_sup.cfg \
--device cuda:0
CPU Mode
guppy_basecaller \
-i fast5_dir/ \
-s output_dir/ \
-c dna_r10.4.1_e8.2_400bps_fast.cfg \
--num_callers 8 \
--cpu_threads_per_caller 4
High Accuracy Model
guppy_basecaller \
-i fast5_dir/ \
-s output_dir/ \
-c dna_r10.4.1_e8.2_400bps_hac.cfg \
--device cuda:0
Super Accuracy Model
guppy_basecaller \
-i fast5_dir/ \
-s output_dir/ \
-c dna_r10.4.1_e8.2_400bps_sup.cfg \
--device cuda:0
List Available Configs
guppy_basecaller --print_workflows
ls /opt/ont/guppy/data/*.cfg
Modified Base Calling
guppy_basecaller \
-i fast5_dir/ \
-s output_dir/ \
-c dna_r10.4.1_e8.2_400bps_modbases_5mc_cg_sup.cfg \
--device cuda:0
Barcoding During Basecalling
guppy_basecaller \
-i fast5_dir/ \
-s output_dir/ \
-c dna_r10.4.1_e8.2_400bps_sup.cfg \
--device cuda:0 \
--barcode_kits SQK-NBD114-24
Output BAM
guppy_basecaller \
-i fast5_dir/ \
-s output_dir/ \
-c dna_r10.4.1_e8.2_400bps_sup.cfg \
--device cuda:0 \
--bam_out \
--index
POD5 File Handling
POD5 is the new format replacing FAST5.
Convert FAST5 to POD5
pod5 convert fast5 fast5_dir/*.fast5 --output pod5_dir/
Merge POD5 Files
pod5 merge pod5_dir/*.pod5 --output merged.pod5
Inspect POD5
pod5 inspect reads input.pod5
pod5 inspect summary input.pod5
Subset POD5
pod5 subset input.pod5 --output subset.pod5 --read-id-file read_ids.txt
Quality Filtering
Filter with Chopper (After Basecalling)
gunzip -c calls.fastq.gz | chopper -q 10 -l 500 | gzip > filtered.fastq.gz
Filter by Quality Score
gunzip -c calls.fastq.gz | \
awk 'BEGIN{OFS="\n"} {h=$0; getline seq; getline plus; getline qual;
split(h, a, " "); split(a[4], q, "=");
if(q[2] >= 10) print h, seq, plus, qual}' | \
gzip > q10_filtered.fastq.gz
NanoFilt (Alternative)
gunzip -c calls.fastq.gz | NanoFilt -q 10 -l 500 | gzip > filtered.fastq.gz
Basecalling QC
NanoPlot
NanoPlot --fastq calls.fastq.gz -o qc_report/ --plots hex dot
NanoPlot --bam calls.bam -o qc_report/
pycoQC (From Sequencing Summary)
pycoQC -f sequencing_summary.txt -o pycoqc_report.html
Basic Stats
seqkit stats calls.fastq.gz
awk 'NR%4==2 {sum+=length($0); count++} END {print "Reads:", count, "Mean length:", sum/count}' calls.fastq
Model Selection Guide
R10.4.1 Chemistry (Current)
| Model | Use |
|---|---|
| dna_r10.4.1_e8.2_400bps_fast | Quick analysis |
| dna_r10.4.1_e8.2_400bps_hac | Routine work |
| dna_r10.4.1_e8.2_400bps_sup | High accuracy |
R9.4.1 Chemistry (Legacy)
| Model | Use |
|---|---|
| dna_r9.4.1_450bps_fast | Quick analysis |
| dna_r9.4.1_450bps_hac | Routine work |
| dna_r9.4.1_450bps_sup | High accuracy |
Complete Pipeline
Goal: Run the full Nanopore basecalling pipeline from raw signal data through quality-filtered reads with a QC report.
Approach: Convert FAST5 to POD5 if needed, basecall with Dorado, convert to FASTQ, filter with chopper, and generate NanoPlot QC.
#!/bin/bash
INPUT=$1
OUTPUT=$2
MODEL=${3:-sup}
mkdir -p $OUTPUT
if [ -d "$INPUT/fast5" ]; then
echo "Converting FAST5 to POD5..."
pod5 convert fast5 $INPUT/fast5/*.fast5 --output $OUTPUT/pod5/
INPUT_DIR="$OUTPUT/pod5"
else
INPUT_DIR="$INPUT"
fi
echo "Basecalling with $MODEL model..."
dorado basecaller $MODEL $INPUT_DIR > $OUTPUT/calls.bam
echo "Converting to FASTQ..."
samtools fastq $OUTPUT/calls.bam | gzip > $OUTPUT/calls.fastq.gz
echo "Filtering..."
gunzip -c $OUTPUT/calls.fastq.gz | chopper -q 10 -l 500 | gzip > $OUTPUT/filtered.fastq.gz
echo "QC report..."
NanoPlot --fastq $OUTPUT/filtered.fastq.gz -o $OUTPUT/qc/
echo "Done!"
GPU Requirements
| Model | VRAM Required | Speed (R10.4.1) |
|---|---|---|
| fast | 4 GB | ~450 bases/s |
| hac | 8 GB | ~200 bases/s |
| sup | 12 GB | ~50 bases/s |
Troubleshooting
Out of Memory
dorado basecaller sup pod5_dir/ --batchsize 32 > calls.bam
Slow CPU Basecalling
dorado basecaller fast pod5_dir/ --device cpu > calls.bam
Check GPU Usage
nvidia-smi -l 1
watch -n 1 nvidia-smi
Related Skills
- long-read-alignment - Align basecalled reads
- long-read-qc - QC after basecalling
- medaka-polishing - Polish using basecalled reads
- structural-variants - SV detection from long reads
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?