Agent skill
bio-atac-seq-atac-qc
Quality control metrics for ATAC-seq data including fragment size distribution, TSS enrichment, FRiP, and library complexity. Use when assessing ATAC-seq library quality before or after peak calling to identify problematic samples.
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-atac-seq-atac-qc
SKILL.md
Version Compatibility
Reference examples tested with: bedtools 2.31+, deepTools 3.5+, numpy 1.26+, pandas 2.2+, picard 3.1+, pyBigWig 0.3+, pysam 0.22+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
pip show <package>thenhelp(module.function)to check signatures - R:
packageVersion('<pkg>')then?function_nameto verify parameters - CLI:
<tool> --versionthen<tool> --helpto confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
ATAC-seq Quality Control
"Check the quality of my ATAC-seq library" → Evaluate fragment size distribution (nucleosome periodicity), TSS enrichment, FRiP, and library complexity to assess chromatin accessibility experiment quality.
- CLI:
deeptools bamPEFragmentSize,picard CollectInsertSizeMetrics - Python:
pysamfor custom fragment analysis
Fragment Size Distribution
Goal: Assess ATAC-seq library quality by visualizing the characteristic nucleosome periodicity in fragment sizes.
Approach: Extract insert sizes from the BAM file using Picard or samtools, producing a distribution that should show NFR (<100 bp) and mono-nucleosome (~200 bp) peaks.
# Using Picard
java -jar picard.jar CollectInsertSizeMetrics \
I=sample.bam \
O=insert_sizes.txt \
H=insert_sizes.pdf \
M=0.5
# Using samtools
samtools view -f 66 sample.bam | \
awk '{print sqrt($9^2)}' | \
sort | uniq -c | \
awk '{print $2"\t"$1}' > fragment_sizes.txt
TSS Enrichment Score
Goal: Quantify signal enrichment at transcription start sites as a key ATAC-seq quality metric.
Approach: Create a TSS BED file, compute a signal matrix around TSS positions using deepTools, then plot the enrichment profile.
# Using deepTools
# 1. Create TSS BED file (from GTF)
awk '$3=="transcript" {print $1"\t"$4-1"\t"$4"\t"$14"\t"0"\t"$7}' genes.gtf | \
tr -d '";' | sort -k1,1 -k2,2n > tss.bed
# 2. Compute matrix around TSS
computeMatrix reference-point \
-S sample.bw \
-R tss.bed \
-a 2000 -b 2000 \
-o tss_matrix.gz
# 3. Plot TSS enrichment
plotProfile -m tss_matrix.gz \
-o tss_enrichment.png \
--perGroup
Calculate TSS Enrichment Score
Goal: Compute a numeric TSS enrichment score from a bigWig signal track.
Approach: Sample signal values in windows around TSS positions, average across all TSSs, then divide center signal by flanking background.
import numpy as np
import pyBigWig
def calculate_tss_enrichment(bigwig_file, tss_bed, flank=2000):
'''Calculate TSS enrichment score.'''
bw = pyBigWig.open(bigwig_file)
signals = []
for line in open(tss_bed):
fields = line.strip().split('\t')
chrom, tss = fields[0], int(fields[1])
strand = fields[5] if len(fields) > 5 else '+'
try:
vals = bw.values(chrom, max(0, tss - flank), tss + flank)
if strand == '-':
vals = vals[::-1]
signals.append(vals)
except:
continue
avg_signal = np.nanmean(signals, axis=0)
# TSS enrichment = signal at TSS / background
background = np.nanmean([avg_signal[:100], avg_signal[-100:]])
tss_signal = np.nanmean(avg_signal[flank-50:flank+50])
enrichment = tss_signal / background if background > 0 else 0
return enrichment, avg_signal
enrichment, signal = calculate_tss_enrichment('sample.bw', 'tss.bed')
print(f'TSS Enrichment Score: {enrichment:.2f}')
FRiP (Fraction of Reads in Peaks)
# Total reads
total=$(samtools view -c -F 4 sample.bam)
# Reads in peaks
in_peaks=$(bedtools intersect -a sample.bam -b peaks.narrowPeak -u | \
samtools view -c)
# FRiP
frip=$(echo "scale=4; $in_peaks / $total" | bc)
echo "FRiP: $frip"
# Good FRiP for ATAC-seq: >0.2 (20%)
Mitochondrial Read Fraction
# Mitochondrial reads
mt_reads=$(samtools view -c sample.bam chrM)
total_reads=$(samtools view -c sample.bam)
mt_frac=$(echo "scale=4; $mt_reads / $total_reads" | bc)
echo "Mitochondrial fraction: $mt_frac"
# Ideal: <20%, concerning: >50%
Library Complexity (NRF, PBC1, PBC2)
Goal: Measure library complexity to detect over-amplification or low-diversity libraries.
Approach: Calculate NRF (unique/total reads), PBC1 (1-read locations / all locations), and PBC2 (1-read / 2-read locations) using Picard or custom counting.
# Using Picard EstimateLibraryComplexity
java -jar picard.jar EstimateLibraryComplexity \
I=sample.bam \
O=complexity.txt
# Or calculate from BAM
# NRF = unique reads / total reads
# PBC1 = locations with exactly 1 read / locations with >= 1 read
# PBC2 = locations with exactly 1 read / locations with exactly 2 reads
import pysam
def calculate_complexity(bam_file):
'''Calculate library complexity metrics.'''
bam = pysam.AlignmentFile(bam_file, 'rb')
positions = {}
total = 0
for read in bam.fetch():
if read.is_unmapped or read.is_secondary:
continue
total += 1
pos = (read.reference_name, read.reference_start)
positions[pos] = positions.get(pos, 0) + 1
distinct = len(positions)
m1 = sum(1 for v in positions.values() if v == 1)
m2 = sum(1 for v in positions.values() if v == 2)
nrf = distinct / total if total > 0 else 0
pbc1 = m1 / distinct if distinct > 0 else 0
pbc2 = m1 / m2 if m2 > 0 else 0
return {'NRF': nrf, 'PBC1': pbc1, 'PBC2': pbc2}
deepTools QC
# Fingerprint plot (assesses enrichment)
plotFingerprint \
-b sample.bam \
--labels sample \
-o fingerprint.png
# Correlation between replicates
multiBamSummary bins \
-b sample1.bam sample2.bam \
-o results.npz
plotCorrelation \
-in results.npz \
--corMethod pearson \
--whatToPlot heatmap \
-o correlation.png
ATACseqQC (R)
library(ATACseqQC)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
# Read BAM
bamfile <- 'sample.bam'
# Fragment size distribution
fragSizeDist(bamfile, 'fragment_size.pdf')
# TSS enrichment
tsse <- TSSEscore(bamfile, TxDb.Hsapiens.UCSC.hg38.knownGene)
print(paste('TSS Enrichment:', round(tsse$TSSEscore, 2)))
# Nucleosome positioning
nucs <- nucleosomePositioningScore(bamfile, TxDb.Hsapiens.UCSC.hg38.knownGene)
Comprehensive QC Report
Goal: Generate a single QC summary combining all major ATAC-seq quality metrics.
Approach: Run samtools and bedtools commands to collect total reads, mapping rate, mitochondrial fraction, FRiP, and peak count, then write a consolidated report.
import subprocess
import pandas as pd
def atac_qc_report(bam_file, peaks_file, output_prefix):
'''Generate comprehensive ATAC-seq QC report.'''
metrics = {}
# Total reads
result = subprocess.check_output(f'samtools view -c -F 4 {bam_file}', shell=True)
metrics['total_reads'] = int(result.strip())
# Mapped reads
result = subprocess.check_output(f'samtools view -c -F 4 -F 256 {bam_file}', shell=True)
metrics['mapped_reads'] = int(result.strip())
# Mitochondrial reads
result = subprocess.check_output(f'samtools view -c {bam_file} chrM', shell=True)
metrics['mt_reads'] = int(result.strip())
metrics['mt_fraction'] = metrics['mt_reads'] / metrics['total_reads']
# Reads in peaks (FRiP)
result = subprocess.check_output(
f'bedtools intersect -a {bam_file} -b {peaks_file} -u | samtools view -c', shell=True)
metrics['reads_in_peaks'] = int(result.strip())
metrics['frip'] = metrics['reads_in_peaks'] / metrics['total_reads']
# Peak count
result = subprocess.check_output(f'wc -l < {peaks_file}', shell=True)
metrics['peak_count'] = int(result.strip())
# Write report
with open(f'{output_prefix}_qc.txt', 'w') as f:
for k, v in metrics.items():
if isinstance(v, float):
f.write(f'{k}: {v:.4f}\n')
else:
f.write(f'{k}: {v}\n')
return metrics
QC Thresholds
| Metric | Good | Acceptable | Poor |
|---|---|---|---|
| TSS Enrichment | >10 | 5-10 | <5 |
| FRiP | >0.3 | 0.1-0.3 | <0.1 |
| MT Fraction | <0.1 | 0.1-0.3 | >0.3 |
| NRF | >0.9 | 0.8-0.9 | <0.8 |
| PBC1 | >0.9 | 0.7-0.9 | <0.7 |
Related Skills
- atac-seq/atac-peak-calling - Peak calling
- alignment-files/bam-statistics - Alignment QC
- chip-seq/chipseq-visualization - Visualization approaches
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?