Agent skill
bio-pileup-generation
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-pileup-generation
SKILL.md
name: bio-pileup-generation description: Generate pileup data for variant calling using samtools mpileup and pysam. Use when preparing data for variant calling, analyzing per-position read data, or calculating allele frequencies. tool_type: cli primary_tool: samtools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Pileup Generation
Generate pileup data for variant calling and position-level analysis.
What is Pileup?
Pileup shows all reads covering each position in the reference, used for:
- Variant calling (with bcftools)
- Coverage analysis
- Allele frequency calculation
- SNP/indel detection
samtools mpileup
Basic Pileup
samtools mpileup -f reference.fa input.bam > pileup.txt
Pileup for Variant Calling (Output BCF)
samtools mpileup -f reference.fa -g input.bam -o output.bcf
Pileup Specific Region
samtools mpileup -f reference.fa -r chr1:1000000-2000000 input.bam
Regions from BED
samtools mpileup -f reference.fa -l targets.bed input.bam
Multiple BAM Files
samtools mpileup -f reference.fa sample1.bam sample2.bam sample3.bam > pileup.txt
Output Format
Text pileup format (6 columns per sample):
chr1 1000 A 15 ............... FFFFFFFFFFF
chr1 1001 T 12 ............ FFFFFFFFFFFF
| Column | Description |
|---|---|
| 1 | Chromosome |
| 2 | Position (1-based) |
| 3 | Reference base |
| 4 | Read depth |
| 5 | Read bases |
| 6 | Base qualities |
Read Bases Encoding
| Symbol | Meaning |
|---|---|
. |
Match on forward strand |
, |
Match on reverse strand |
ACGT |
Mismatch (uppercase = forward) |
acgt |
Mismatch (lowercase = reverse) |
^Q |
Start of read (Q = MAPQ as ASCII) |
$ |
End of read |
+NNN |
Insertion of N bases |
-NNN |
Deletion of N bases |
* |
Deleted base |
> / < |
Reference skip (intron) |
Quality Filtering Options
Minimum Mapping Quality
samtools mpileup -f reference.fa -q 20 input.bam
Minimum Base Quality
samtools mpileup -f reference.fa -Q 20 input.bam
Combined Quality Filters
samtools mpileup -f reference.fa -q 20 -Q 20 input.bam
Maximum Depth
# Prevent memory issues with high coverage
samtools mpileup -f reference.fa -d 1000 input.bam
Variant Calling Pipeline
mpileup to bcftools call
samtools mpileup -f reference.fa input.bam | bcftools call -mv -o variants.vcf
Direct BCF Output
samtools mpileup -f reference.fa -g -o output.bcf input.bam
bcftools call -mv output.bcf -o variants.vcf
Full Pipeline
samtools mpileup -f reference.fa -q 20 -Q 20 input.bam | \
bcftools call -mv -Oz -o variants.vcf.gz
bcftools index variants.vcf.gz
pysam Python Alternative
Basic Pileup
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for pileup_column in bam.pileup('chr1', 1000000, 1001000):
print(f'{pileup_column.reference_name}:{pileup_column.pos} depth={pileup_column.n}')
Access Reads at Position
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for pileup_column in bam.pileup('chr1', 1000000, 1000001, truncate=True):
print(f'Position: {pileup_column.pos}')
print(f'Depth: {pileup_column.n}')
for pileup_read in pileup_column.pileups:
if pileup_read.is_del:
print(' Deletion')
elif pileup_read.is_refskip:
print(' Reference skip')
else:
qpos = pileup_read.query_position
base = pileup_read.alignment.query_sequence[qpos]
qual = pileup_read.alignment.query_qualities[qpos]
print(f' {base} (Q{qual})')
Count Alleles at Position
import pysam
from collections import Counter
def allele_counts(bam_path, chrom, pos):
counts = Counter()
with pysam.AlignmentFile(bam_path, 'rb') as bam:
for pileup_column in bam.pileup(chrom, pos, pos + 1, truncate=True):
if pileup_column.pos != pos:
continue
for pileup_read in pileup_column.pileups:
if pileup_read.is_del:
counts['DEL'] += 1
elif pileup_read.is_refskip:
continue
else:
qpos = pileup_read.query_position
base = pileup_read.alignment.query_sequence[qpos]
counts[base.upper()] += 1
return dict(counts)
counts = allele_counts('input.bam', 'chr1', 1000000)
print(counts) # {'A': 45, 'G': 5}
Calculate Allele Frequency
import pysam
from collections import Counter
def allele_frequency(bam_path, chrom, pos, min_qual=20):
counts = Counter()
with pysam.AlignmentFile(bam_path, 'rb') as bam:
for pileup_column in bam.pileup(chrom, pos, pos + 1, truncate=True,
min_base_quality=min_qual):
if pileup_column.pos != pos:
continue
for pileup_read in pileup_column.pileups:
if pileup_read.is_del or pileup_read.is_refskip:
continue
qpos = pileup_read.query_position
base = pileup_read.alignment.query_sequence[qpos]
counts[base.upper()] += 1
total = sum(counts.values())
if total == 0:
return {}
return {base: count / total for base, count in counts.items()}
freq = allele_frequency('input.bam', 'chr1', 1000000)
for base, f in sorted(freq.items(), key=lambda x: -x[1]):
print(f'{base}: {f:.1%}')
Pileup with Quality Filtering
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for pileup_column in bam.pileup('chr1', 1000000, 1001000,
truncate=True,
min_mapping_quality=20,
min_base_quality=20):
print(f'{pileup_column.pos}: {pileup_column.n}')
Generate Pileup Text
import pysam
def pileup_text(bam_path, ref_path, chrom, start, end):
with pysam.AlignmentFile(bam_path, 'rb') as bam:
with pysam.FastaFile(ref_path) as ref:
for pileup_column in bam.pileup(chrom, start, end, truncate=True):
pos = pileup_column.pos
ref_base = ref.fetch(chrom, pos, pos + 1)
depth = pileup_column.n
bases = []
for pileup_read in pileup_column.pileups:
if pileup_read.is_del:
bases.append('*')
elif pileup_read.is_refskip:
bases.append('>')
else:
qpos = pileup_read.query_position
base = pileup_read.alignment.query_sequence[qpos]
if base.upper() == ref_base.upper():
bases.append('.' if not pileup_read.alignment.is_reverse else ',')
else:
bases.append(base.upper() if not pileup_read.alignment.is_reverse else base.lower())
print(f'{chrom}\t{pos+1}\t{ref_base}\t{depth}\t{"".join(bases)}')
pileup_text('input.bam', 'reference.fa', 'chr1', 1000000, 1000100)
Pileup Options Summary
| Option | Description |
|---|---|
-f FILE |
Reference FASTA (required) |
-r REGION |
Restrict to region |
-l FILE |
BED file of regions |
-q INT |
Min mapping quality |
-Q INT |
Min base quality |
-d INT |
Max depth (default 8000) |
-g |
Output BCF format |
-u |
Uncompressed BCF output |
Quick Reference
| Task | Command |
|---|---|
| Basic pileup | samtools mpileup -f ref.fa in.bam |
| Quality filter | samtools mpileup -f ref.fa -q 20 -Q 20 in.bam |
| Region | samtools mpileup -f ref.fa -r chr1:1-1000 in.bam |
| BCF output | samtools mpileup -f ref.fa -g in.bam -o out.bcf |
| To bcftools | samtools mpileup -f ref.fa in.bam | bcftools call -mv |
Common Errors
| Error | Cause | Solution |
|---|---|---|
No FASTA reference |
Missing -f option | Add -f reference.fa |
Reference mismatch |
Wrong reference | Use same reference as alignment |
| Out of memory | High coverage region | Use -d to cap depth |
Related Skills
- alignment-filtering - Filter BAM before pileup
- reference-operations - Index reference for pileup
- bam-statistics - depth command for coverage
- variant-calling - bcftools call from pileup
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?