Agent skill

bio-basecalling

Convert raw Nanopore signal data (FAST5/POD5) to nucleotide sequences using Dorado basecaller. Covers model selection, GPU acceleration, modified base detection, and quality filtering. Use when processing raw Nanopore data before alignment. Guppy is deprecated; use Dorado for all new analyses.

View SKILL.md on GitHub Repository

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-basecalling

SKILL.md

Version Compatibility

Reference examples tested with: samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:

CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Nanopore Basecalling

"Basecall my Nanopore data" → Convert raw electrical signal (FAST5/POD5) into nucleotide sequences with quality scores, optionally detecting modified bases.

CLI: dorado basecaller sup pod5/ > calls.bam (recommended), dorado basecaller sup,5mCG_5hmCG pod5/ (with modifications)

Convert raw electrical signal from Nanopore sequencing into nucleotide sequences.

Dorado (Recommended)

Dorado is ONT's current production basecaller, replacing Guppy. It offers better accuracy and speed.

Basic Basecalling

bash

dorado basecaller sup pod5_dir/ > calls.bam

Choose Model

bash

dorado basecaller fast pod5_dir/ > calls.bam
dorado basecaller hac pod5_dir/ > calls.bam
dorado basecaller sup pod5_dir/ > calls.bam

Model Speed vs Accuracy

Model	Speed	Accuracy	Use Case
fast	Fastest	Lower	Quick preview
hac	Medium	High	General use
sup	Slowest	Highest	Publication quality

Specific Model Version

bash

dorado download --model dna_r10.4.1_e8.2_400bps_sup@v5.1.0
dorado basecaller dna_r10.4.1_e8.2_400bps_sup@v5.1.0 pod5_dir/ > calls.bam

List Available Models

bash

dorado download --list

Output FASTQ Instead of BAM

bash

dorado basecaller sup pod5_dir/ --emit-fastq > calls.fastq

Modified Base Detection

bash

dorado basecaller sup,5mCG_5hmCG pod5_dir/ > calls_mods.bam
dorado basecaller sup,5mCG pod5_dir/ > calls_5mc.bam
dorado basecaller sup,6mA pod5_dir/ > calls_6ma.bam

GPU Selection

bash

dorado basecaller sup pod5_dir/ --device cuda:0 > calls.bam
dorado basecaller sup pod5_dir/ --device cuda:0,1 > calls.bam
dorado basecaller sup pod5_dir/ --device cpu > calls.bam

Batch Size for Memory

bash

dorado basecaller sup pod5_dir/ --batchsize 64 > calls.bam

Duplex Calling

bash

dorado duplex sup pod5_dir/ > duplex.bam

Demultiplexing During Basecalling

bash

dorado basecaller sup pod5_dir/ --kit-name SQK-NBD114-24 > calls.bam
dorado demux calls.bam --output-dir demuxed/ --kit-name SQK-NBD114-24

Trim Adapters

bash

dorado basecaller sup pod5_dir/ --trim adapters > calls.bam
dorado basecaller sup pod5_dir/ --no-trim > calls_untrimmed.bam

Resume Interrupted Run

bash

dorado basecaller sup pod5_dir/ --resume-from calls.bam > calls_complete.bam

Guppy (Deprecated - Legacy Only)

Guppy is deprecated and no longer receiving updates. Use Dorado for all new analyses. Guppy examples below are only for maintaining legacy pipelines.

Basic Basecalling

bash

guppy_basecaller \
    -i fast5_dir/ \
    -s output_dir/ \
    -c dna_r10.4.1_e8.2_400bps_sup.cfg \
    --device cuda:0

CPU Mode

bash

guppy_basecaller \
    -i fast5_dir/ \
    -s output_dir/ \
    -c dna_r10.4.1_e8.2_400bps_fast.cfg \
    --num_callers 8 \
    --cpu_threads_per_caller 4

High Accuracy Model

bash

guppy_basecaller \
    -i fast5_dir/ \
    -s output_dir/ \
    -c dna_r10.4.1_e8.2_400bps_hac.cfg \
    --device cuda:0

Super Accuracy Model

bash

guppy_basecaller \
    -i fast5_dir/ \
    -s output_dir/ \
    -c dna_r10.4.1_e8.2_400bps_sup.cfg \
    --device cuda:0

List Available Configs

bash

guppy_basecaller --print_workflows
ls /opt/ont/guppy/data/*.cfg

Modified Base Calling

bash

guppy_basecaller \
    -i fast5_dir/ \
    -s output_dir/ \
    -c dna_r10.4.1_e8.2_400bps_modbases_5mc_cg_sup.cfg \
    --device cuda:0

Barcoding During Basecalling

bash

guppy_basecaller \
    -i fast5_dir/ \
    -s output_dir/ \
    -c dna_r10.4.1_e8.2_400bps_sup.cfg \
    --device cuda:0 \
    --barcode_kits SQK-NBD114-24

Output BAM

bash

guppy_basecaller \
    -i fast5_dir/ \
    -s output_dir/ \
    -c dna_r10.4.1_e8.2_400bps_sup.cfg \
    --device cuda:0 \
    --bam_out \
    --index

POD5 File Handling

POD5 is the new format replacing FAST5.

Convert FAST5 to POD5

bash

pod5 convert fast5 fast5_dir/*.fast5 --output pod5_dir/

Merge POD5 Files

bash

pod5 merge pod5_dir/*.pod5 --output merged.pod5

Inspect POD5

bash

pod5 inspect reads input.pod5
pod5 inspect summary input.pod5

Subset POD5

bash

pod5 subset input.pod5 --output subset.pod5 --read-id-file read_ids.txt

Quality Filtering

Filter with Chopper (After Basecalling)

bash

gunzip -c calls.fastq.gz | chopper -q 10 -l 500 | gzip > filtered.fastq.gz

Filter by Quality Score

bash

gunzip -c calls.fastq.gz | \
    awk 'BEGIN{OFS="\n"} {h=$0; getline seq; getline plus; getline qual;
         split(h, a, " "); split(a[4], q, "=");
         if(q[2] >= 10) print h, seq, plus, qual}' | \
    gzip > q10_filtered.fastq.gz

NanoFilt (Alternative)

bash

gunzip -c calls.fastq.gz | NanoFilt -q 10 -l 500 | gzip > filtered.fastq.gz

Basecalling QC

NanoPlot

bash

NanoPlot --fastq calls.fastq.gz -o qc_report/ --plots hex dot
NanoPlot --bam calls.bam -o qc_report/

pycoQC (From Sequencing Summary)

bash

pycoQC -f sequencing_summary.txt -o pycoqc_report.html

Basic Stats

bash

seqkit stats calls.fastq.gz

awk 'NR%4==2 {sum+=length($0); count++} END {print "Reads:", count, "Mean length:", sum/count}' calls.fastq

Model Selection Guide

R10.4.1 Chemistry (Current)

Model	Use
dna_r10.4.1_e8.2_400bps_fast	Quick analysis
dna_r10.4.1_e8.2_400bps_hac	Routine work
dna_r10.4.1_e8.2_400bps_sup	High accuracy

R9.4.1 Chemistry (Legacy)

Model	Use
dna_r9.4.1_450bps_fast	Quick analysis
dna_r9.4.1_450bps_hac	Routine work
dna_r9.4.1_450bps_sup	High accuracy

Complete Pipeline

Goal: Run the full Nanopore basecalling pipeline from raw signal data through quality-filtered reads with a QC report.

Approach: Convert FAST5 to POD5 if needed, basecall with Dorado, convert to FASTQ, filter with chopper, and generate NanoPlot QC.

bash

#!/bin/bash
INPUT=$1
OUTPUT=$2
MODEL=${3:-sup}

mkdir -p $OUTPUT

if [ -d "$INPUT/fast5" ]; then
    echo "Converting FAST5 to POD5..."
    pod5 convert fast5 $INPUT/fast5/*.fast5 --output $OUTPUT/pod5/
    INPUT_DIR="$OUTPUT/pod5"
else
    INPUT_DIR="$INPUT"
fi

echo "Basecalling with $MODEL model..."
dorado basecaller $MODEL $INPUT_DIR > $OUTPUT/calls.bam

echo "Converting to FASTQ..."
samtools fastq $OUTPUT/calls.bam | gzip > $OUTPUT/calls.fastq.gz

echo "Filtering..."
gunzip -c $OUTPUT/calls.fastq.gz | chopper -q 10 -l 500 | gzip > $OUTPUT/filtered.fastq.gz

echo "QC report..."
NanoPlot --fastq $OUTPUT/filtered.fastq.gz -o $OUTPUT/qc/

echo "Done!"

GPU Requirements

Model	VRAM Required	Speed (R10.4.1)
fast	4 GB	~450 bases/s
hac	8 GB	~200 bases/s
sup	12 GB	~50 bases/s

Troubleshooting

Out of Memory

bash

dorado basecaller sup pod5_dir/ --batchsize 32 > calls.bam

Slow CPU Basecalling

bash

dorado basecaller fast pod5_dir/ --device cpu > calls.bam

Check GPU Usage

bash

nvidia-smi -l 1
watch -n 1 nvidia-smi

Related Skills

long-read-alignment - Align basecalled reads
long-read-qc - QC after basecalling
medaka-polishing - Polish using basecalled reads
structural-variants - SV detection from long reads

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/bio-basecalling
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Version Compatibility

Nanopore Basecalling

Dorado (Recommended)

Basic Basecalling

Choose Model

Model Speed vs Accuracy

Specific Model Version

List Available Models

Output FASTQ Instead of BAM

Modified Base Detection

GPU Selection

Batch Size for Memory

Duplex Calling

Demultiplexing During Basecalling

Trim Adapters

Resume Interrupted Run

Guppy (Deprecated - Legacy Only)

Basic Basecalling

CPU Mode

High Accuracy Model

Super Accuracy Model

List Available Configs

Modified Base Calling

Barcoding During Basecalling

Output BAM

POD5 File Handling

Convert FAST5 to POD5

Merge POD5 Files

Inspect POD5

Subset POD5

Quality Filtering

Filter with Chopper (After Basecalling)

Filter by Quality Score

NanoFilt (Alternative)

Basecalling QC

NanoPlot

pycoQC (From Sequencing Summary)

Basic Stats

Model Selection Guide

R10.4.1 Chemistry (Current)

R9.4.1 Chemistry (Legacy)

Complete Pipeline

GPU Requirements

Troubleshooting

Out of Memory

Slow CPU Basecalling

Check GPU Usage

Related Skills

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations