Agent skill

bio-clip-seq-clip-peak-calling

Stars 2,009
Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-clip-seq-clip-peak-calling

SKILL.md


name: bio-clip-seq-clip-peak-calling description: Call protein-RNA binding site peaks from CLIP-seq data using CLIPper, PureCLIP, or Piranha. Use when identifying RBP binding sites from aligned CLIP reads. tool_type: cli primary_tool: CLIPper measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

CLIP-seq Peak Calling

CLIPper (Recommended)

bash
# Basic peak calling
clipper \
    -b deduped.bam \
    -s hg38 \
    -o peaks.bed \
    --save-pickle

# With FDR threshold
clipper \
    -b deduped.bam \
    -s hg38 \
    -o peaks.bed \
    --FDR 0.05 \
    --superlocal

# Specify gene annotations
clipper \
    -b deduped.bam \
    -s hg38 \
    --gene genes.bed \
    -o peaks.bed

CLIPper Options

Option Description
-b Input BAM file
-s Species (hg38, mm10)
-o Output BED file
--FDR FDR threshold (default 0.05)
--superlocal Use superlocal background
--gene Custom gene annotation BED
--save-pickle Save intermediate data

PureCLIP (HMM-Based)

PureCLIP uses an HMM to model crosslink sites, incorporating enrichment and truncation signals.

bash
# Installation
conda install -c bioconda pureclip

# Basic peak calling
pureclip \
    -i deduped.bam \
    -bai deduped.bam.bai \
    -g genome.fa \
    -o crosslink_sites.bed \
    -or binding_regions.bed \
    -nt 4

# -nt 4: Number of threads. Adjust based on CPU cores.
# -o: Single-nucleotide crosslink sites
# -or: Broader binding regions

PureCLIP Options

Option Description
-i Input BAM file
-bai BAM index file
-g Reference genome FASTA
-o Crosslink sites output
-or Binding regions output
-nt Number of threads
-iv Interval file to restrict analysis
-dm Min distance for merging

PureCLIP with Input Control

bash
# With SMInput control BAM
pureclip \
    -i clip.bam \
    -bai clip.bam.bai \
    -g genome.fa \
    -ibam sminput.bam \
    -ibai sminput.bam.bai \
    -o crosslinks.bed \
    -or regions.bed \
    -nt 8

# -ibam/-ibai: Input control BAM for background modeling

PureCLIP Output

bash
# Crosslink sites BED contains:
# chr start end name score strand

# Score interpretation:
# Higher scores = more confident crosslink sites

# Filter by score
# score>=3: Medium confidence. Use 5+ for high confidence.
awk '$5 >= 3' crosslink_sites.bed > filtered_sites.bed

PureCLIP for Different CLIP Types

bash
# eCLIP (recommended settings)
pureclip -i eclip.bam -bai eclip.bam.bai -g genome.fa \
    -o sites.bed -or regions.bed -nt 4 -dm 8

# iCLIP (single-nucleotide resolution)
pureclip -i iclip.bam -bai iclip.bam.bai -g genome.fa \
    -o sites.bed -or regions.bed -nt 4

# PAR-CLIP (T-to-C transitions)
pureclip -i parclip.bam -bai parclip.bam.bai -g genome.fa \
    -o sites.bed -or regions.bed -nt 4

Piranha

bash
# Basic usage
Piranha -s deduped.bam -o peaks.bed

# With p-value threshold
Piranha -s deduped.bam -o peaks.bed -p 0.01

# Stranded analysis
Piranha -s deduped.bam -o peaks.bed -p 0.01 -u

# Zero-truncated negative binomial
Piranha -s deduped.bam -o peaks.bed -d ZeroTruncatedNegativeBinomial

PEAKachu (for PAR-CLIP)

bash
# PAR-CLIP specific caller
peakachu adaptive \
    -c control.bam \
    -t treatment.bam \
    -r reference.fa \
    -o peakachu_peaks.gff

MACS3 for CLIP (Alternative)

bash
# Use narrow peak calling mode
macs3 callpeak \
    -t deduped.bam \
    -f BAM \
    -g hs \
    -n clip_peaks \
    --nomodel \
    --extsize 50 \
    -q 0.01

Strand-Specific Peak Calling

bash
# Split BAM by strand
samtools view -h -F 16 deduped.bam | samtools view -Sb - > plus_strand.bam
samtools view -h -f 16 deduped.bam | samtools view -Sb - > minus_strand.bam

# Call peaks on each strand
clipper -b plus_strand.bam -s hg38 -o peaks_plus.bed
clipper -b minus_strand.bam -s hg38 -o peaks_minus.bed

# Combine
cat peaks_plus.bed peaks_minus.bed | sort -k1,1 -k2,2n > peaks_all.bed

Filter Peaks

bash
# By score
awk '$5 >= 10' peaks.bed > peaks_filtered.bed

# By size
awk '($3 - $2) >= 20 && ($3 - $2) <= 200' peaks.bed > peaks_sized.bed

# By read count (if in name field)
awk '$5 >= 5' peaks.bed > peaks_min5reads.bed

Merge Replicates

bash
# Use bedtools to find consensus peaks
bedtools intersect -a rep1_peaks.bed -b rep2_peaks.bed -wa | \
    sort -u > consensus_peaks.bed

# Require overlap in N replicates
bedtools multiinter -i rep1.bed rep2.bed rep3.bed | \
    awk '$4 >= 2' | \
    bedtools merge > consensus_peaks.bed

Peak Metrics

python
import pandas as pd

def load_clip_peaks(bed_path):
    peaks = pd.read_csv(bed_path, sep='\t', header=None,
                        names=['chrom', 'start', 'end', 'name', 'score', 'strand'])
    return peaks

def peak_stats(peaks):
    stats = {
        'n_peaks': len(peaks),
        'mean_width': (peaks['end'] - peaks['start']).mean(),
        'median_score': peaks['score'].median(),
        'peaks_per_chrom': peaks.groupby('chrom').size().to_dict()
    }
    return stats

peaks = load_clip_peaks('peaks.bed')
print(peak_stats(peaks))

Quality Metrics

Metric Good Value Description
Peak count 1,000-50,000 Depends on RBP
Peak width 20-100 nt Typical for RBP footprint
FRiP >0.1 Fraction reads in peaks

Calculate FRiP

bash
# Reads in peaks
reads_in_peaks=$(bedtools intersect -a deduped.bam -b peaks.bed -u | samtools view -c -)

# Total reads
total_reads=$(samtools view -c deduped.bam)

# FRiP
frip=$(echo "scale=4; $reads_in_peaks / $total_reads" | bc)
echo "FRiP: $frip"

Related Skills

  • clip-alignment - Generate aligned BAM
  • clip-preprocessing - UMI deduplication
  • binding-site-annotation - Annotate peaks with gene features
  • clip-motif-analysis - Find enriched motifs in peaks

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results