Agent skill
bio-alignment-indexing
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-alignment-indexing
SKILL.md
name: bio-alignment-indexing description: Create and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions. tool_type: cli primary_tool: samtools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Alignment Indexing
Create indices for random access to alignment files using samtools and pysam.
Index Types
| Index | Extension | Use Case |
|---|---|---|
| BAI | .bai |
Standard BAM index, chromosomes < 512 Mbp |
| CSI | .csi |
Large chromosomes, custom bin sizes |
| CRAI | .crai |
CRAM index |
samtools index
Create BAI Index
samtools index input.bam
# Creates input.bam.bai
Create CSI Index
samtools index -c input.bam
# Creates input.bam.csi
Specify Output Name
samtools index input.bam output.bai
Multi-threaded Indexing
samtools index -@ 4 input.bam
Index CRAM
samtools index input.cram
# Creates input.cram.crai
Index Requirements
Indexing requires coordinate-sorted files:
# Check sort order
samtools view -H input.bam | grep "^@HD"
# Should show SO:coordinate
# Sort if needed, then index
samtools sort -o sorted.bam input.bam
samtools index sorted.bam
Using Indices for Region Access
samtools view with Region
# Requires index file present
samtools view input.bam chr1:1000000-2000000
Multiple Regions
samtools view input.bam chr1:1000-2000 chr2:3000-4000
Regions from BED File
samtools view -L regions.bed input.bam
pysam Python Alternative
Create Index
import pysam
pysam.index('input.bam')
# Creates input.bam.bai
Create CSI Index
pysam.index('input.bam', 'input.bam.csi', csi=True)
Fetch with Index
with pysam.AlignmentFile('input.bam', 'rb') as bam:
# fetch() requires index
for read in bam.fetch('chr1', 1000000, 2000000):
print(read.query_name)
Check if Indexed
import pysam
from pathlib import Path
def is_indexed(bam_path):
bam_path = Path(bam_path)
return (bam_path.with_suffix('.bam.bai').exists() or
Path(str(bam_path) + '.bai').exists() or
bam_path.with_suffix('.bam.csi').exists())
if not is_indexed('input.bam'):
pysam.index('input.bam')
Fetch Multiple Regions
regions = [('chr1', 1000, 2000), ('chr1', 5000, 6000), ('chr2', 1000, 2000)]
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for chrom, start, end in regions:
count = sum(1 for _ in bam.fetch(chrom, start, end))
print(f'{chrom}:{start}-{end}: {count} reads')
Count Reads in Region
with pysam.AlignmentFile('input.bam', 'rb') as bam:
count = bam.count('chr1', 1000000, 2000000)
print(f'Reads in region: {count}')
Get Reads Covering Position
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for read in bam.fetch('chr1', 1000000, 1000001):
if read.reference_start <= 1000000 < read.reference_end:
print(f'{read.query_name} covers position 1000000')
Index File Locations
samtools looks for indices in two locations:
input.bam.bai # Standard location
input.bai # Alternative location
For CRAM:
input.cram.crai
idxstats - Index Statistics
Get Per-Chromosome Counts
samtools idxstats input.bam
Output format:
chr1 248956422 5000000 0
chr2 242193529 4500000 0
* 0 0 10000
Columns: reference name, length, mapped reads, unmapped reads
Sum Total Mapped Reads
samtools idxstats input.bam | awk '{sum += $3} END {print sum}'
pysam idxstats
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for stat in bam.get_index_statistics():
print(f'{stat.contig}: {stat.mapped} mapped, {stat.unmapped} unmapped')
FASTA Index (faidx)
Related but different - index reference FASTA for random access:
samtools faidx reference.fa
# Creates reference.fa.fai
# Fetch region from indexed FASTA
samtools faidx reference.fa chr1:1000-2000
pysam FastaFile
with pysam.FastaFile('reference.fa') as ref:
seq = ref.fetch('chr1', 1000, 2000)
print(seq)
Quick Reference
| Task | samtools | pysam |
|---|---|---|
| Create BAI | samtools index file.bam |
pysam.index('file.bam') |
| Create CSI | samtools index -c file.bam |
pysam.index('file.bam', csi=True) |
| Fetch region | samtools view file.bam chr1:1-1000 |
bam.fetch('chr1', 0, 1000) |
| Count in region | samtools view -c file.bam chr1:1-1000 |
bam.count('chr1', 0, 1000) |
| Index stats | samtools idxstats file.bam |
bam.get_index_statistics() |
| Index FASTA | samtools faidx ref.fa |
Automatic with FastaFile |
Common Errors
| Error | Cause | Solution |
|---|---|---|
random alignment retrieval only works for indexed BAM |
Missing index | Run samtools index file.bam |
file is not sorted |
Unsorted BAM | Sort first with samtools sort |
chromosome not found |
Wrong chromosome name | Check names with samtools view -H |
Related Skills
- sam-bam-basics - View and convert alignment files
- alignment-sorting - Sort BAM files (required before indexing)
- alignment-filtering - Filter by regions using index
- bam-statistics - Use idxstats for quick counts
- sequence-io/read-sequences - Index FASTA with SeqIO.index_db()
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?