Agent skills
bio-pileup-generation

Agent skill

bio-pileup-generation

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-pileup-generation

SKILL.md

name: bio-pileup-generation description: Generate pileup data for variant calling using samtools mpileup and pysam. Use when preparing data for variant calling, analyzing per-position read data, or calculating allele frequencies. tool_type: cli primary_tool: samtools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

read_file
run_shell_command

Pileup Generation

Generate pileup data for variant calling and position-level analysis.

What is Pileup?

Pileup shows all reads covering each position in the reference, used for:

Variant calling (with bcftools)
Coverage analysis
Allele frequency calculation
SNP/indel detection

samtools mpileup

Basic Pileup

bash

samtools mpileup -f reference.fa input.bam > pileup.txt

Pileup for Variant Calling (Output BCF)

bash

samtools mpileup -f reference.fa -g input.bam -o output.bcf

Pileup Specific Region

bash

samtools mpileup -f reference.fa -r chr1:1000000-2000000 input.bam

Regions from BED

bash

samtools mpileup -f reference.fa -l targets.bed input.bam

Multiple BAM Files

bash

samtools mpileup -f reference.fa sample1.bam sample2.bam sample3.bam > pileup.txt

Output Format

Text pileup format (6 columns per sample):

chr1    1000    A    15    ...............    FFFFFFFFFFF
chr1    1001    T    12    ............      FFFFFFFFFFFF

Column	Description
1	Chromosome
2	Position (1-based)
3	Reference base
4	Read depth
5	Read bases
6	Base qualities

Read Bases Encoding

Symbol	Meaning
`.`	Match on forward strand
`,`	Match on reverse strand
`ACGT`	Mismatch (uppercase = forward)
`acgt`	Mismatch (lowercase = reverse)
`^Q`	Start of read (Q = MAPQ as ASCII)
`$`	End of read
`+NNN`	Insertion of N bases
`-NNN`	Deletion of N bases
`*`	Deleted base
`>` / `<`	Reference skip (intron)

Quality Filtering Options

Minimum Mapping Quality

bash

samtools mpileup -f reference.fa -q 20 input.bam

Minimum Base Quality

bash

samtools mpileup -f reference.fa -Q 20 input.bam

Combined Quality Filters

bash

samtools mpileup -f reference.fa -q 20 -Q 20 input.bam

Maximum Depth

bash

# Prevent memory issues with high coverage
samtools mpileup -f reference.fa -d 1000 input.bam

Variant Calling Pipeline

mpileup to bcftools call

bash

samtools mpileup -f reference.fa input.bam | bcftools call -mv -o variants.vcf

Direct BCF Output

bash

samtools mpileup -f reference.fa -g -o output.bcf input.bam
bcftools call -mv output.bcf -o variants.vcf

Full Pipeline

bash

samtools mpileup -f reference.fa -q 20 -Q 20 input.bam | \
    bcftools call -mv -Oz -o variants.vcf.gz
bcftools index variants.vcf.gz

pysam Python Alternative

Basic Pileup

python

import pysam

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for pileup_column in bam.pileup('chr1', 1000000, 1001000):
        print(f'{pileup_column.reference_name}:{pileup_column.pos} depth={pileup_column.n}')

Access Reads at Position

python

import pysam

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for pileup_column in bam.pileup('chr1', 1000000, 1000001, truncate=True):
        print(f'Position: {pileup_column.pos}')
        print(f'Depth: {pileup_column.n}')

        for pileup_read in pileup_column.pileups:
            if pileup_read.is_del:
                print('  Deletion')
            elif pileup_read.is_refskip:
                print('  Reference skip')
            else:
                qpos = pileup_read.query_position
                base = pileup_read.alignment.query_sequence[qpos]
                qual = pileup_read.alignment.query_qualities[qpos]
                print(f'  {base} (Q{qual})')

Count Alleles at Position

python

import pysam
from collections import Counter

def allele_counts(bam_path, chrom, pos):
    counts = Counter()

    with pysam.AlignmentFile(bam_path, 'rb') as bam:
        for pileup_column in bam.pileup(chrom, pos, pos + 1, truncate=True):
            if pileup_column.pos != pos:
                continue

            for pileup_read in pileup_column.pileups:
                if pileup_read.is_del:
                    counts['DEL'] += 1
                elif pileup_read.is_refskip:
                    continue
                else:
                    qpos = pileup_read.query_position
                    base = pileup_read.alignment.query_sequence[qpos]
                    counts[base.upper()] += 1

    return dict(counts)

counts = allele_counts('input.bam', 'chr1', 1000000)
print(counts)  # {'A': 45, 'G': 5}

Calculate Allele Frequency

python

import pysam
from collections import Counter

def allele_frequency(bam_path, chrom, pos, min_qual=20):
    counts = Counter()

    with pysam.AlignmentFile(bam_path, 'rb') as bam:
        for pileup_column in bam.pileup(chrom, pos, pos + 1, truncate=True,
                                         min_base_quality=min_qual):
            if pileup_column.pos != pos:
                continue

            for pileup_read in pileup_column.pileups:
                if pileup_read.is_del or pileup_read.is_refskip:
                    continue
                qpos = pileup_read.query_position
                base = pileup_read.alignment.query_sequence[qpos]
                counts[base.upper()] += 1

    total = sum(counts.values())
    if total == 0:
        return {}

    return {base: count / total for base, count in counts.items()}

freq = allele_frequency('input.bam', 'chr1', 1000000)
for base, f in sorted(freq.items(), key=lambda x: -x[1]):
    print(f'{base}: {f:.1%}')

Pileup with Quality Filtering

python

import pysam

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    for pileup_column in bam.pileup('chr1', 1000000, 1001000,
                                     truncate=True,
                                     min_mapping_quality=20,
                                     min_base_quality=20):
        print(f'{pileup_column.pos}: {pileup_column.n}')

Generate Pileup Text

python

import pysam

def pileup_text(bam_path, ref_path, chrom, start, end):
    with pysam.AlignmentFile(bam_path, 'rb') as bam:
        with pysam.FastaFile(ref_path) as ref:
            for pileup_column in bam.pileup(chrom, start, end, truncate=True):
                pos = pileup_column.pos
                ref_base = ref.fetch(chrom, pos, pos + 1)
                depth = pileup_column.n

                bases = []
                for pileup_read in pileup_column.pileups:
                    if pileup_read.is_del:
                        bases.append('*')
                    elif pileup_read.is_refskip:
                        bases.append('>')
                    else:
                        qpos = pileup_read.query_position
                        base = pileup_read.alignment.query_sequence[qpos]
                        if base.upper() == ref_base.upper():
                            bases.append('.' if not pileup_read.alignment.is_reverse else ',')
                        else:
                            bases.append(base.upper() if not pileup_read.alignment.is_reverse else base.lower())

                print(f'{chrom}\t{pos+1}\t{ref_base}\t{depth}\t{"".join(bases)}')

pileup_text('input.bam', 'reference.fa', 'chr1', 1000000, 1000100)

Pileup Options Summary

Option	Description
`-f FILE`	Reference FASTA (required)
`-r REGION`	Restrict to region
`-l FILE`	BED file of regions
`-q INT`	Min mapping quality
`-Q INT`	Min base quality
`-d INT`	Max depth (default 8000)
`-g`	Output BCF format
`-u`	Uncompressed BCF output

Quick Reference

Task	Command
Basic pileup	`samtools mpileup -f ref.fa in.bam`
Quality filter	`samtools mpileup -f ref.fa -q 20 -Q 20 in.bam`
Region	`samtools mpileup -f ref.fa -r chr1:1-1000 in.bam`
BCF output	`samtools mpileup -f ref.fa -g in.bam -o out.bcf`
To bcftools	`samtools mpileup -f ref.fa in.bam \| bcftools call -mv`

Common Errors

Error	Cause	Solution
`No FASTA reference`	Missing -f option	Add `-f reference.fa`
`Reference mismatch`	Wrong reference	Use same reference as alignment
Out of memory	High coverage region	Use `-d` to cap depth

Related Skills

alignment-filtering - Filter BAM before pileup
reference-operations - Index reference for pileup
bam-statistics - depth command for coverage
variant-calling - bcftools call from pileup

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/bio-pileup-generation
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Pileup Generation

What is Pileup?

samtools mpileup

Basic Pileup

Pileup for Variant Calling (Output BCF)

Pileup Specific Region

Regions from BED

Multiple BAM Files

Output Format

Read Bases Encoding

Quality Filtering Options

Minimum Mapping Quality

Minimum Base Quality

Combined Quality Filters

Maximum Depth

Variant Calling Pipeline

mpileup to bcftools call

Direct BCF Output

Full Pipeline

pysam Python Alternative

Basic Pileup

Access Reads at Position

Count Alleles at Position

Calculate Allele Frequency

Pileup with Quality Filtering

Generate Pileup Text

Pileup Options Summary

Quick Reference

Common Errors

Related Skills

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations