Agent skills
bio-long-read-sequencing-isose...

Agent skill

bio-long-read-sequencing-isoseq-analysis

Analyze PacBio Iso-Seq data for full-length isoform discovery and quantification. Use when characterizing transcript diversity or identifying novel splice variants.

View SKILL.md on GitHub Repository

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-long-read-sequencing-isoseq-analysis

SKILL.md

Version Compatibility

Reference examples tested with: minimap2 2.26+, pandas 2.2+, pysam 0.22+, samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
R: packageVersion('<pkg>') then ?function_name to verify parameters
CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Iso-Seq Analysis

"Analyze full-length isoforms from my Iso-Seq data" → Process PacBio HiFi reads through CCS generation, primer removal, clustering, and isoform classification to discover novel transcript variants.

CLI: isoseq3 refine → isoseq3 cluster → pbmm2 align → sqanti3_qc.py

IsoSeq3 Pipeline Overview

bash

# Full pipeline: subreads -> HQ transcripts
# 1. CCS: Generate circular consensus sequences
# 2. Lima: Remove primers and demultiplex
# 3. Refine: Remove polyA and concatemers
# 4. Cluster: Group into isoforms
# 5. Polish: Generate high-quality consensus (optional with HiFi)

CCS Generation

bash

# Generate CCS from subreads (skip if using HiFi reads)
ccs input.subreads.bam ccs.bam \
    --min-rq 0.9 \
    --min-passes 3 \
    --num-threads 32

# For HiFi reads, CCS is already done
# Start directly from HiFi reads

Primer Removal with Lima

bash

# Iso-Seq specific primer removal
lima ccs.bam primers.fasta demux.bam \
    --isoseq \
    --peek-guess \
    --num-threads 16

# Output: demux.primer_5p--primer_3p.bam
# Lima reports also contain demux statistics

# Check lima report
cat demux.lima.summary

Primer File Format

fasta

>primer_5p
AAGCAGTGGTATCAACGCAGAGTACATGGG
>primer_3p
AAGCAGTGGTATCAACGCAGAGTAC

Refine Full-Length Reads

bash

# Remove polyA tails and concatemers
isoseq3 refine demux.primer_5p--primer_3p.bam primers.fasta refined.bam \
    --require-polya \
    --min-polya-length 20

# Output: refined.bam (full-length non-chimeric reads)
# Also: refined.filter_summary.json

# Check refinement stats
cat refined.filter_summary.json | jq

Cluster Into Isoforms

bash

# Cluster similar transcripts
isoseq3 cluster refined.bam clustered.bam \
    --verbose \
    --use-qvs \
    --num-threads 32

# Output files:
# - clustered.bam: Clustered transcripts
# - clustered.hq_transcripts.fasta: High-quality consensus
# - clustered.lq_transcripts.fasta: Low-quality consensus
# - clustered.cluster_report.csv: Cluster membership

Align to Reference

bash

# Map HQ transcripts to reference genome
minimap2 -ax splice:hq \
    -uf \
    --secondary=no \
    reference.fa \
    clustered.hq_transcripts.fasta \
    | samtools sort -o aligned.bam

samtools index aligned.bam

# For downstream analysis
pbmm2 align reference.fa clustered.bam aligned.bam \
    --preset ISOSEQ \
    --sort

Collapse Redundant Isoforms

bash

# Collapse mapped transcripts
isoseq3 collapse aligned.bam collapsed.gff

# Output:
# - collapsed.gff: Collapsed transcript models
# - collapsed.abundance.txt: Read counts per isoform
# - collapsed.group.txt: Isoform groupings

# Convert to GTF
gffread collapsed.gff -T -o collapsed.gtf

SQANTI3 Quality Control

bash

# Classify isoforms against reference annotation
sqanti3_qc.py \
    clustered.hq_transcripts.fasta \
    reference.gtf \
    reference.fa \
    -o sqanti_output \
    --aligner_choice minimap2 \
    --cage_peak cage_peaks.bed \
    --polyA_motif_list polyA_motifs.txt \
    --cpus 16

# Key output files:
# - sqanti_output_classification.txt: Per-isoform metrics
# - sqanti_output_junctions.txt: Splice junction details
# - sqanti_output.params.txt: Run parameters

SQANTI3 Categories

Category	Code	Description
Full Splice Match	FSM	All junctions match reference
Incomplete Splice Match	ISM	Subset of reference junctions
Novel In Catalog	NIC	Novel combination of known junctions
Novel Not in Catalog	NNC	Contains novel junction
Antisense	AS	Overlaps gene on opposite strand
Genic	G	Within gene but no junction match
Intergenic	IR	Between genes
Fusion	FU	Spans multiple genes

SQANTI3 Filtering

bash

# Filter artifacts using SQANTI3 rules
sqanti3_filter.py \
    sqanti_output_classification.txt \
    --isoforms clustered.hq_transcripts.fasta \
    --gtf collapsed.gtf \
    --faa predicted_proteins.faa \
    -o sqanti_filtered

# Custom filtering
python << 'EOF'
import pandas as pd

classification = pd.read_csv('sqanti_output_classification.txt', sep='\t')

# Keep FSM, ISM, NIC with evidence
keep = classification[
    (classification['structural_category'].isin(['full-splice_match', 'incomplete-splice_match', 'novel_in_catalog'])) &
    (classification['FL'] >= 2) &
    (classification['bite'] == 'FALSE')
]
keep['isoform'].to_csv('filtered_isoforms.txt', index=False, header=False)
EOF

Quantification with Pigeon

bash

# PacBio's isoform quantification tool
pigeon classify \
    aligned.bam \
    reference.gtf \
    reference.fa \
    -o pigeon_output

# Produces count matrix and classification
pigeon report pigeon_output_classification.txt

TAMA for Annotation Merge

bash

# Merge Iso-Seq with reference annotation
# First, convert to TAMA format
tama_format_convert.py \
    -i collapsed.gtf \
    -f gtf \
    -o isoseq.bed

# Create file list
echo -e "isoseq.bed\tcapped\t1\t1" > file_list.txt
echo -e "reference.bed\tcapped\t1\t2" >> file_list.txt

# Merge annotations
tama_merge.py \
    -f file_list.txt \
    -p merged \
    -a 50 \
    -z 50 \
    -m 10

Python Processing

Goal: Summarize Iso-Seq clustering results including isoform counts, read support, and transcript lengths.

Approach: Parse the cluster report CSV for per-isoform read counts and extract sequence lengths from the HQ FASTA.

python

import pysam
import pandas as pd

def parse_cluster_report(report_path):
    df = pd.read_csv(report_path)
    isoform_counts = df.groupby('cluster_id').size()
    return isoform_counts

def get_transcript_lengths(fasta_path):
    lengths = {}
    with pysam.FastxFile(fasta_path) as fh:
        for entry in fh:
            lengths[entry.name] = len(entry.sequence)
    return lengths

def summarize_isoseq(cluster_report, hq_fasta):
    counts = parse_cluster_report(cluster_report)
    lengths = get_transcript_lengths(hq_fasta)

    print(f'Total isoforms: {len(counts)}')
    print(f'Median support: {counts.median():.0f} reads')
    print(f'Mean length: {sum(lengths.values())/len(lengths):.0f} bp')

    return counts, lengths

Differential Isoform Usage

library(IsoformSwitchAnalyzeR)

# Import SQANTI3 results
switchList <- importIsoformExpression(
    isoformCountMatrix = 'counts.txt',
    isoformRepExpression = 'tpm.txt',
    designMatrix = design
)

# Add SQANTI3 annotations
switchList <- addORFfromFASTA(
    switchList,
    orfs = 'sqanti_corrected.fasta'
)

# Analyze switching
switchList <- isoformSwitchTestDEXSeq(switchList)

# Extract significant switches
sig_switches <- switchList$isoformSwitchAnalysis[
    switchList$isoformSwitchAnalysis$padj < 0.05,
]

Quality Metrics

Metric	Good	Acceptable	Poor
CCS passes	>3	2-3	<2
Full-length %	>80%	60-80%	<60%
Clustering rate	>90%	80-90%	<80%
FSM %	>50%	30-50%	<30%
Novel isoforms	10-30%	30-50%	>50% (suspect)

Troubleshooting

Issue	Possible Cause	Solution
Low full-length %	Primer issues	Check primer sequences
High concatemer %	Library prep	Increase SMRTbell cleanup
Few FSM	Poor reference	Use comprehensive GTF
High NNC %	Novel biology or artifacts	Validate with orthogonal data
Low clustering	High diversity	Reduce clustering stringency

Docker/Singularity

bash

# Using PacBio Docker images
docker run -v /data:/data \
    quay.io/biocontainers/isoseq3:4.0.0--h9ee0642_0 \
    isoseq3 cluster /data/refined.bam /data/clustered.bam

# SQANTI3 in Singularity
singularity exec sqanti3.sif \
    sqanti3_qc.py input.fa ref.gtf ref.fa -o output

Related Skills

long-read-sequencing/basecalling - ONT/PacBio basics
rna-quantification/alignment-free-quant - Expression analysis
genome-intervals/gtf-gff-handling - GTF/GFF handling
differential-expression/timeseries-de - Differential isoform usage

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/bio-long-read-sequencing-isoseq-analysis
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Version Compatibility

Iso-Seq Analysis

IsoSeq3 Pipeline Overview

CCS Generation

Primer Removal with Lima

Primer File Format

Refine Full-Length Reads

Cluster Into Isoforms

Align to Reference

Collapse Redundant Isoforms

SQANTI3 Quality Control

SQANTI3 Categories

SQANTI3 Filtering

Quantification with Pigeon

TAMA for Annotation Merge

Python Processing

Differential Isoform Usage

Quality Metrics

Troubleshooting

Docker/Singularity

Related Skills

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations