Agent skill

gnomad-database

Query gnomAD (Genome Aggregation Database) for population allele frequencies, variant constraint scores (pLI, LOEUF), and loss-of-function intolerance. Essential for variant pathogenicity interpretation, rare disease genetics, and identifying loss-of-function intolerant genes.

View SKILL.md on GitHub Repository

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/gnomad-database

Metadata

Additional technical details for this skill

skill author: Kuan-lin Huang

SKILL.md

gnomAD Database

Overview

The Genome Aggregation Database (gnomAD) is the largest publicly available collection of human genetic variation, aggregated from large-scale sequencing projects. gnomAD v4 contains exome sequences from 730,947 individuals and genome sequences from 76,215 individuals across diverse ancestries. It provides population allele frequencies, variant consequence annotations, and gene-level constraint metrics that are essential for interpreting the clinical significance of genetic variants.

Key resources:

gnomAD browser: https://gnomad.broadinstitute.org/
GraphQL API: https://gnomad.broadinstitute.org/api
Data downloads: https://gnomad.broadinstitute.org/downloads
Documentation: https://gnomad.broadinstitute.org/help

When to Use This Skill

Use gnomAD when:

Variant frequency lookup: Checking if a variant is rare, common, or absent in the general population
Pathogenicity assessment: Rare variants (MAF < 1%) are candidates for disease causation; gnomAD helps filter benign common variants
Loss-of-function intolerance: Using pLI and LOEUF scores to assess whether a gene tolerates protein-truncating variants
Population-stratified frequencies: Comparing allele frequencies across ancestries (African/African American, Admixed American, Ashkenazi Jewish, East Asian, Finnish, Middle Eastern, Non-Finnish European, South Asian)
ClinVar/ACMG variant classification: gnomAD frequency data feeds into BA1/BS1 evidence codes for variant classification
Constraint analysis: Identifying genes depleted of missense or loss-of-function variation (z-scores, pLI, LOEUF)

Core Capabilities

1. gnomAD GraphQL API

gnomAD uses a GraphQL API accessible at https://gnomad.broadinstitute.org/api. Most queries fetch variants by gene or specific genomic position.

Datasets available:

gnomad_r4 — gnomAD v4 exomes (recommended default, GRCh38)
gnomad_r4_genomes — gnomAD v4 genomes (GRCh38)
gnomad_r3 — gnomAD v3 genomes (GRCh38)
gnomad_r2_1 — gnomAD v2 exomes (GRCh37)

Reference genomes:

GRCh38 — default for v3/v4
GRCh37 — for v2

2. Querying Variants by Gene

python

import requests

def query_gnomad_gene(gene_symbol, dataset="gnomad_r4", reference_genome="GRCh38"):
    """Fetch variants in a gene from gnomAD."""
    url = "https://gnomad.broadinstitute.org/api"

    query = """
    query GeneVariants($gene_symbol: String!, $dataset: DatasetId!, $reference_genome: ReferenceGenomeId!) {
      gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
        gene_id
        gene_symbol
        variants(dataset: $dataset) {
          variant_id
          pos
          ref
          alt
          consequence
          genome {
            af
            ac
            an
            ac_hom
            populations {
              id
              ac
              an
              af
            }
          }
          exome {
            af
            ac
            an
            ac_hom
          }
          lof
          lof_flags
          lof_filter
        }
      }
    }
    """

    variables = {
        "gene_symbol": gene_symbol,
        "dataset": dataset,
        "reference_genome": reference_genome
    }

    response = requests.post(url, json={"query": query, "variables": variables})
    return response.json()

# Example
result = query_gnomad_gene("BRCA1")
gene_data = result["data"]["gene"]
variants = gene_data["variants"]

# Filter to rare PTVs
rare_ptvs = [
    v for v in variants
    if v.get("lof") == "LC" or v.get("consequence") in ["stop_gained", "frameshift_variant"]
    and v.get("genome", {}).get("af", 1) < 0.001
]
print(f"Found {len(rare_ptvs)} rare PTVs in {gene_data['gene_symbol']}")

3. Querying a Specific Variant

python

import requests

def query_gnomad_variant(variant_id, dataset="gnomad_r4"):
    """Fetch details for a specific variant (e.g., '1-55516888-G-GA')."""
    url = "https://gnomad.broadinstitute.org/api"

    query = """
    query VariantDetails($variantId: String!, $dataset: DatasetId!) {
      variant(variantId: $variantId, dataset: $dataset) {
        variant_id
        chrom
        pos
        ref
        alt
        genome {
          af
          ac
          an
          ac_hom
          populations {
            id
            ac
            an
            af
          }
        }
        exome {
          af
          ac
          an
          ac_hom
          populations {
            id
            ac
            an
            af
          }
        }
        consequence
        lof
        rsids
        in_silico_predictors {
          id
          value
          flags
        }
        clinvar_variation_id
      }
    }
    """

    response = requests.post(
        url,
        json={"query": query, "variables": {"variantId": variant_id, "dataset": dataset}}
    )
    return response.json()

# Example: query a specific variant
result = query_gnomad_variant("17-43094692-G-A")  # BRCA1 missense
variant = result["data"]["variant"]

if variant:
    genome_af = variant.get("genome", {}).get("af", "N/A")
    exome_af = variant.get("exome", {}).get("af", "N/A")
    print(f"Variant: {variant['variant_id']}")
    print(f"  Consequence: {variant['consequence']}")
    print(f"  Genome AF: {genome_af}")
    print(f"  Exome AF: {exome_af}")
    print(f"  LoF: {variant.get('lof')}")

4. Gene Constraint Scores

gnomAD constraint scores assess how tolerant a gene is to variation relative to expectation:

python

import requests

def query_gnomad_constraint(gene_symbol, reference_genome="GRCh38"):
    """Fetch constraint scores for a gene."""
    url = "https://gnomad.broadinstitute.org/api"

    query = """
    query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
      gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
        gene_id
        gene_symbol
        gnomad_constraint {
          exp_lof
          exp_mis
          exp_syn
          obs_lof
          obs_mis
          obs_syn
          oe_lof
          oe_mis
          oe_syn
          oe_lof_lower
          oe_lof_upper
          lof_z
          mis_z
          syn_z
          pLI
        }
      }
    }
    """

    response = requests.post(
        url,
        json={"query": query, "variables": {"gene_symbol": gene_symbol, "reference_genome": reference_genome}}
    )
    return response.json()

# Example
result = query_gnomad_constraint("KCNQ2")
gene = result["data"]["gene"]
constraint = gene["gnomad_constraint"]

print(f"Gene: {gene['gene_symbol']}")
print(f"  pLI:   {constraint['pLI']:.3f}  (>0.9 = LoF intolerant)")
print(f"  LOEUF: {constraint['oe_lof_upper']:.3f}  (<0.35 = highly constrained)")
print(f"  Obs/Exp LoF: {constraint['oe_lof']:.3f}")
print(f"  Missense Z:  {constraint['mis_z']:.3f}")

Constraint score interpretation:

Score	Range	Meaning
`pLI`	0–1	Probability of LoF intolerance; >0.9 = highly intolerant
`LOEUF`	0–∞	LoF observed/expected upper bound; <0.35 = constrained
`oe_lof`	0–∞	Observed/expected ratio for LoF variants
`mis_z`	−∞ to ∞	Missense constraint z-score; >3.09 = constrained
`syn_z`	−∞ to ∞	Synonymous z-score (control; should be near 0)

5. Population Frequency Analysis

python

import requests
import pandas as pd

def get_population_frequencies(variant_id, dataset="gnomad_r4"):
    """Extract per-population allele frequencies for a variant."""
    url = "https://gnomad.broadinstitute.org/api"

    query = """
    query PopFreqs($variantId: String!, $dataset: DatasetId!) {
      variant(variantId: $variantId, dataset: $dataset) {
        variant_id
        genome {
          populations {
            id
            ac
            an
            af
            ac_hom
          }
        }
      }
    }
    """

    response = requests.post(
        url,
        json={"query": query, "variables": {"variantId": variant_id, "dataset": dataset}}
    )
    data = response.json()
    populations = data["data"]["variant"]["genome"]["populations"]

    df = pd.DataFrame(populations)
    df = df[df["an"] > 0].copy()
    df["af"] = df["ac"] / df["an"]
    df = df.sort_values("af", ascending=False)
    return df

# Population IDs in gnomAD v4:
# afr = African/African American
# ami = Amish
# amr = Admixed American
# asj = Ashkenazi Jewish
# eas = East Asian
# fin = Finnish
# mid = Middle Eastern
# nfe = Non-Finnish European
# sas = South Asian
# remaining = Other

6. Structural Variants (gnomAD-SV)

gnomAD also contains a structural variant dataset:

python

import requests

def query_gnomad_sv(gene_symbol):
    """Query structural variants overlapping a gene."""
    url = "https://gnomad.broadinstitute.org/api"

    query = """
    query SVsByGene($gene_symbol: String!) {
      gene(gene_symbol: $gene_symbol, reference_genome: GRCh38) {
        structural_variants {
          variant_id
          type
          chrom
          pos
          end
          af
          ac
          an
        }
      }
    }
    """

    response = requests.post(url, json={"query": query, "variables": {"gene_symbol": gene_symbol}})
    return response.json()

Query Workflows

Workflow 1: Variant Pathogenicity Assessment

Check population frequency — Is the variant rare enough to be pathogenic?
- Use gnomAD AF < 1% for recessive, < 0.1% for dominant conditions
- Check ancestry-specific frequencies (a variant rare overall may be common in one population)
Assess functional impact — LoF variants have highest prior probability
- Check lof field: HC = high-confidence LoF, LC = low-confidence
- Check lof_flags for issues like "NAGNAG_SITE", "PHYLOCSF_WEAK"
Apply ACMG criteria:
- BA1: AF > 5% → Benign Stand-Alone
- BS1: AF > disease prevalence threshold → Benign Supporting
- PM2: Absent or very rare in gnomAD → Pathogenic Moderate

Workflow 2: Gene Prioritization in Rare Disease

Query constraint scores for candidate genes
Filter for pLI > 0.9 (haploinsufficient) or LOEUF < 0.35
Cross-reference with observed LoF variants in the gene
Integrate with ClinVar and disease databases

Workflow 3: Population Genetics Research

Identify variant of interest from GWAS or clinical data
Query per-population frequencies
Compare frequency differences across ancestries
Test for enrichment in specific founder populations

Best Practices

Use gnomAD v4 (gnomad_r4) for the most current data; use v2 (gnomad_r2_1) only for GRCh37 compatibility
Handle null responses: Variants not observed in gnomAD are not necessarily pathogenic — absence is informative
Distinguish exome vs. genome data: Genome data has more uniform coverage; exome data is larger but may have coverage gaps
Rate limit GraphQL queries: Add delays between requests; batch queries when possible
Homozygous counts (ac_hom) are relevant for recessive disease analysis
LOEUF is preferred over pLI for gene constraint (less sensitive to sample size)

Data Access

Browser: https://gnomad.broadinstitute.org/ — interactive variant and gene browsing
GraphQL API: https://gnomad.broadinstitute.org/api — programmatic access
Downloads: https://gnomad.broadinstitute.org/downloads — VCF, Hail tables, constraint tables
Google Cloud: gs://gcp-public-data--gnomad/

Additional Resources

gnomAD website: https://gnomad.broadinstitute.org/
gnomAD blog: https://gnomad.broadinstitute.org/news
Downloads: https://gnomad.broadinstitute.org/downloads
API explorer: https://gnomad.broadinstitute.org/api (interactive GraphiQL)
Constraint documentation: https://gnomad.broadinstitute.org/help/constraint
Citation: Karczewski KJ et al. (2020) Nature. PMID: 32461654; Chen S et al. (2024) Nature. PMID: 38conservation
GitHub: https://github.com/broadinstitute/gnomad-browser

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/gnomad-database
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

gnomAD Database

Overview

When to Use This Skill

Core Capabilities

1. gnomAD GraphQL API

2. Querying Variants by Gene

3. Querying a Specific Variant

4. Gene Constraint Scores

5. Population Frequency Analysis

6. Structural Variants (gnomAD-SV)

Query Workflows

Workflow 1: Variant Pathogenicity Assessment

Workflow 2: Gene Prioritization in Rare Disease

Workflow 3: Population Genetics Research

Best Practices

Data Access

Additional Resources

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations