Agent skill

jaspar-database

Query JASPAR for transcription factor binding site (TFBS) profiles (PWMs/PFMs). Search by TF name, species, or class; scan DNA sequences for TF binding sites; compare matrices; essential for regulatory genomics, motif analysis, and GWAS regulatory variant interpretation.

View SKILL.md on GitHub Repository

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/jaspar-database

Metadata

Additional technical details for this skill

skill author: Kuan-lin Huang

SKILL.md

JASPAR Database

Overview

JASPAR (https://jaspar.elixir.no/) is the gold-standard open-access database of curated, non-redundant transcription factor (TF) binding profiles stored as position frequency matrices (PFMs). JASPAR 2024 contains 1,210 non-redundant TF binding profiles for 164 eukaryotic species. Each profile is experimentally derived (ChIP-seq, SELEX, HT-SELEX, protein binding microarray, etc.) and rigorously validated.

Key resources:

JASPAR portal: https://jaspar.elixir.no/
REST API: https://jaspar.elixir.no/api/v1/
API docs: https://jaspar.elixir.no/api/v1/docs/
Python package: jaspar (via Biopython) or direct API

When to Use This Skill

Use JASPAR when:

TF binding site prediction: Scan a DNA sequence for potential binding sites of a TF
Regulatory variant interpretation: Does a GWAS/eQTL variant disrupt a TF binding motif?
Promoter/enhancer analysis: What TFs are predicted to bind to a regulatory element?
Gene regulatory network construction: Link TFs to their target genes via motif scanning
TF family analysis: Compare binding profiles across a TF family (e.g., all homeobox factors)
ChIP-seq analysis: Find known TF motifs enriched in ChIP-seq peaks
ENCODE/ATAC-seq interpretation: Match open chromatin regions to TF binding profiles

Core Capabilities

1. JASPAR REST API

Base URL: https://jaspar.elixir.no/api/v1/

python

import requests

BASE_URL = "https://jaspar.elixir.no/api/v1"

def jaspar_get(endpoint, params=None):
    url = f"{BASE_URL}/{endpoint}"
    response = requests.get(url, params=params, headers={"Accept": "application/json"})
    response.raise_for_status()
    return response.json()

2. Search for TF Profiles

python

def search_jaspar(
    tf_name=None,
    species=None,
    collection="CORE",
    tf_class=None,
    tf_family=None,
    page=1,
    page_size=25
):
    """Search JASPAR for TF binding profiles."""
    params = {
        "collection": collection,
        "page": page,
        "page_size": page_size,
        "format": "json"
    }
    if tf_name:
        params["name"] = tf_name
    if species:
        params["species"] = species  # Use taxonomy ID or name, e.g., "9606" for human
    if tf_class:
        params["tf_class"] = tf_class
    if tf_family:
        params["tf_family"] = tf_family

    return jaspar_get("matrix", params)

# Examples:
# Search for human CTCF profile
ctcf = search_jaspar("CTCF", species="9606")
print(f"Found {ctcf['count']} CTCF profiles")

# Search for all homeobox TFs in human
hox_tfs = search_jaspar(tf_class="Homeodomain", species="9606")

# Search for a TF family
nfkb = search_jaspar(tf_family="NF-kappaB")

3. Fetch a Specific Matrix (PFM/PWM)

python

def get_matrix(matrix_id):
    """Fetch a specific JASPAR matrix by ID (e.g., 'MA0139.1' for CTCF)."""
    return jaspar_get(f"matrix/{matrix_id}/")

# Example: Get CTCF matrix
ctcf_matrix = get_matrix("MA0139.1")

# Matrix structure:
# {
#   "matrix_id": "MA0139.1",
#   "name": "CTCF",
#   "collection": "CORE",
#   "tax_group": "vertebrates",
#   "pfm": { "A": [...], "C": [...], "G": [...], "T": [...] },
#   "consensus": "CCGCGNGGNGGCAG",
#   "length": 19,
#   "species": [{"tax_id": 9606, "name": "Homo sapiens"}],
#   "class": ["C2H2 zinc finger factors"],
#   "family": ["BEN domain factors"],
#   "type": "ChIP-seq",
#   "uniprot_ids": ["P49711"]
# }

4. Download PFM/PWM as Matrix

python

import numpy as np

def get_pwm(matrix_id, pseudocount=0.8):
    """
    Fetch a PFM from JASPAR and convert to PWM (log-odds).
    Returns numpy array of shape (4, L) in order A, C, G, T.
    """
    matrix = get_matrix(matrix_id)
    pfm = matrix["pfm"]

    # Convert PFM to numpy
    pfm_array = np.array([pfm["A"], pfm["C"], pfm["G"], pfm["T"]], dtype=float)

    # Add pseudocount
    pfm_array += pseudocount

    # Normalize to get PPM
    ppm = pfm_array / pfm_array.sum(axis=0, keepdims=True)

    # Convert to PWM (log-odds relative to background 0.25)
    background = 0.25
    pwm = np.log2(ppm / background)

    return pwm, matrix["name"]

# Example
pwm, name = get_pwm("MA0139.1")  # CTCF
print(f"PWM for {name}: shape {pwm.shape}")
max_score = pwm.max(axis=0).sum()
print(f"Maximum possible score: {max_score:.2f} bits")

5. Scan a DNA Sequence for TF Binding Sites

python

import numpy as np
from typing import List, Tuple

NUCLEOTIDE_MAP = {'A': 0, 'C': 1, 'G': 2, 'T': 3,
                  'a': 0, 'c': 1, 'g': 2, 't': 3}

def scan_sequence(sequence: str, pwm: np.ndarray, threshold_pct: float = 0.8) -> List[dict]:
    """
    Scan a DNA sequence for TF binding sites using a PWM.

    Args:
        sequence: DNA sequence string
        pwm: PWM array (4 x L) in ACGT order
        threshold_pct: Fraction of max score to use as threshold (0-1)

    Returns:
        List of hits with position, score, and matched sequence
    """
    motif_len = pwm.shape[1]
    max_score = pwm.max(axis=0).sum()
    min_score = pwm.min(axis=0).sum()
    threshold = min_score + threshold_pct * (max_score - min_score)

    hits = []
    seq = sequence.upper()

    for i in range(len(seq) - motif_len + 1):
        subseq = seq[i:i + motif_len]
        # Skip if contains non-ACGT
        if any(c not in NUCLEOTIDE_MAP for c in subseq):
            continue

        score = sum(pwm[NUCLEOTIDE_MAP[c], j] for j, c in enumerate(subseq))

        if score >= threshold:
            relative_score = (score - min_score) / (max_score - min_score)
            hits.append({
                "position": i + 1,  # 1-based
                "score": score,
                "relative_score": relative_score,
                "sequence": subseq,
                "strand": "+"
            })

    return hits

# Example: Scan a promoter sequence for CTCF binding sites
promoter = "AGCCCGCGAGGNGGCAGTTGCCTGGAGCAGGATCAGCAGATC"
pwm, name = get_pwm("MA0139.1")
hits = scan_sequence(promoter, pwm, threshold_pct=0.75)
for hit in hits:
    print(f"  Position {hit['position']}: {hit['sequence']} (score: {hit['score']:.2f}, {hit['relative_score']:.0%})")

6. Scan Both Strands

python

def reverse_complement(seq: str) -> str:
    complement = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C', 'N': 'N'}
    return ''.join(complement.get(b, 'N') for b in reversed(seq.upper()))

def scan_both_strands(sequence: str, pwm: np.ndarray, threshold_pct: float = 0.8):
    """Scan forward and reverse complement strands."""
    fwd_hits = scan_sequence(sequence, pwm, threshold_pct)
    for h in fwd_hits:
        h["strand"] = "+"

    rev_seq = reverse_complement(sequence)
    rev_hits = scan_sequence(rev_seq, pwm, threshold_pct)
    seq_len = len(sequence)
    for h in rev_hits:
        h["strand"] = "-"
        h["position"] = seq_len - h["position"] - len(h["sequence"]) + 2  # Convert to fwd coords

    all_hits = fwd_hits + rev_hits
    return sorted(all_hits, key=lambda x: x["position"])

7. Variant Impact on TF Binding

python

def variant_tfbs_impact(ref_seq: str, alt_seq: str, pwm: np.ndarray,
                          tf_name: str, threshold_pct: float = 0.7):
    """
    Assess impact of a SNP on TF binding by comparing ref vs alt sequences.
    Both sequences should be centered on the variant with flanking context.
    """
    ref_hits = scan_both_strands(ref_seq, pwm, threshold_pct)
    alt_hits = scan_both_strands(alt_seq, pwm, threshold_pct)

    max_ref = max((h["score"] for h in ref_hits), default=None)
    max_alt = max((h["score"] for h in alt_hits), default=None)

    result = {
        "tf": tf_name,
        "ref_max_score": max_ref,
        "alt_max_score": max_alt,
        "ref_has_site": len(ref_hits) > 0,
        "alt_has_site": len(alt_hits) > 0,
    }
    if max_ref and max_alt:
        result["score_change"] = max_alt - max_ref
        result["effect"] = "gained" if max_alt > max_ref else "disrupted"
    elif max_ref and not max_alt:
        result["effect"] = "disrupted"
    elif not max_ref and max_alt:
        result["effect"] = "gained"
    else:
        result["effect"] = "no_site"

    return result

Query Workflows

Workflow 1: Find All TF Binding Sites in a Promoter

python

import requests, numpy as np

# 1. Get relevant TF matrices (e.g., all human TFs in CORE collection)
response = requests.get(
    "https://jaspar.elixir.no/api/v1/matrix/",
    params={"species": "9606", "collection": "CORE", "page_size": 500, "page": 1}
)
matrices = response.json()["results"]

# 2. For each matrix, compute PWM and scan promoter
promoter = "CCCGCCCGCCCGCCGCCCGCAGTTAATGAGCCCAGCGTGCC"  # Example

all_hits = []
for m in matrices[:10]:  # Limit for demo
    pwm_data = requests.get(f"https://jaspar.elixir.no/api/v1/matrix/{m['matrix_id']}/").json()
    pfm = pfm_data["pfm"]
    pfm_arr = np.array([pfm["A"], pfm["C"], pfm["G"], pfm["T"]], dtype=float) + 0.8
    ppm = pfm_arr / pfm_arr.sum(axis=0)
    pwm = np.log2(ppm / 0.25)

    hits = scan_sequence(promoter, pwm, threshold_pct=0.8)
    for h in hits:
        h["tf_name"] = m["name"]
        h["matrix_id"] = m["matrix_id"]
    all_hits.extend(hits)

print(f"Found {len(all_hits)} TF binding sites")
for h in sorted(all_hits, key=lambda x: -x["score"])[:5]:
    print(f"  {h['tf_name']} ({h['matrix_id']}): pos {h['position']}, score {h['score']:.2f}")

Workflow 2: SNP Impact on TF Binding (Regulatory Variant Analysis)

Retrieve the genomic sequence flanking the SNP (±20 bp each side)
Construct ref and alt sequences
Scan with all relevant TF PWMs
Report TFs whose binding is created or destroyed by the SNP

Workflow 3: Motif Enrichment Analysis

Identify a set of peak sequences (e.g., from ChIP-seq or ATAC-seq)
Scan all peaks with JASPAR PWMs
Compare hit rates in peaks vs. background sequences
Report significantly enriched motifs (Fisher's exact test or FIMO-style scoring)

Collections Available

Collection	Description	Profiles
`CORE`	Non-redundant, high-quality profiles	~1,210
`UNVALIDATED`	Experimentally derived but not validated	~500
`PHYLOFACTS`	Phylogenetically conserved sites	~50
`CNE`	Conserved non-coding elements	~30
`POLII`	RNA Pol II binding profiles	~20
`FAM`	TF family representative profiles	~170
`SPLICE`	Splice factor profiles	~20

Best Practices

Use CORE collection for most analyses — best validated and non-redundant
Threshold selection: 80% of max score is common for de novo prediction; 90% for high-confidence
Always scan both strands — TFs can bind in either orientation
Provide flanking context for variant analysis: at least (motif_length - 1) bp on each side
Consider background: PWM scores relative to uniform (0.25) background; adjust for actual GC content
Cross-validate with ChIP-seq data when available — motif scanning has many false positives
Use Biopython's motifs module for full-featured scanning: from Bio import motifs

Additional Resources

JASPAR portal: https://jaspar.elixir.no/
API documentation: https://jaspar.elixir.no/api/v1/docs/
JASPAR 2024 paper: Castro-Mondragon et al. (2022) Nucleic Acids Research. PMID: 34875674
Biopython motifs: https://biopython.org/docs/latest/Tutorial/chapter_motifs.html
FIMO tool (for large-scale scanning): https://meme-suite.org/meme/tools/fimo
HOMER (motif enrichment): http://homer.ucsd.edu/homer/
GitHub: https://github.com/wassermanlab/JASPAR-UCSC-tracks

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/jaspar-database
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

JASPAR Database

Overview

When to Use This Skill

Core Capabilities

1. JASPAR REST API

2. Search for TF Profiles

3. Fetch a Specific Matrix (PFM/PWM)

4. Download PFM/PWM as Matrix

5. Scan a DNA Sequence for TF Binding Sites

6. Scan Both Strands

7. Variant Impact on TF Binding

Query Workflows

Workflow 1: Find All TF Binding Sites in a Promoter

Workflow 2: SNP Impact on TF Binding (Regulatory Variant Analysis)

Workflow 3: Motif Enrichment Analysis

Collections Available

Best Practices

Additional Resources

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations