Agent skills
gatk-variant-calling

Agent skill

gatk-variant-calling

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/variant-interpretation-acmg/bioSkills/gatk-variant-calling

SKILL.md

name: bio-gatk-variant-calling description: Variant calling with GATK HaplotypeCaller following best practices. Covers germline SNP/indel calling, GVCF workflow for cohorts, joint genotyping, and variant quality score recalibration (VQSR). Use when calling variants with GATK HaplotypeCaller. tool_type: cli primary_tool: gatk measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

read_file
run_shell_command

GATK Variant Calling

GATK HaplotypeCaller is the gold standard for germline variant calling. This skill covers the GATK Best Practices workflow.

Prerequisites

BAM files should be preprocessed:

Mark duplicates
Base quality score recalibration (BQSR) - optional but recommended

Single-Sample Calling

Basic HaplotypeCaller

bash

gatk HaplotypeCaller \
    -R reference.fa \
    -I sample.bam \
    -O sample.vcf.gz

With Standard Annotations

bash

gatk HaplotypeCaller \
    -R reference.fa \
    -I sample.bam \
    -O sample.vcf.gz \
    -A Coverage \
    -A QualByDepth \
    -A FisherStrand \
    -A StrandOddsRatio \
    -A MappingQualityRankSumTest \
    -A ReadPosRankSumTest

Target Intervals (Exome/Panel)

bash

gatk HaplotypeCaller \
    -R reference.fa \
    -I sample.bam \
    -L targets.interval_list \
    -O sample.vcf.gz

Adjust Calling Confidence

bash

gatk HaplotypeCaller \
    -R reference.fa \
    -I sample.bam \
    -O sample.vcf.gz \
    --standard-min-confidence-threshold-for-calling 20

GVCF Workflow (Recommended for Cohorts)

The GVCF workflow enables joint genotyping across samples for better variant calls.

Step 1: Generate GVCFs per Sample

bash

gatk HaplotypeCaller \
    -R reference.fa \
    -I sample.bam \
    -O sample.g.vcf.gz \
    -ERC GVCF

Step 2: Combine GVCFs (GenomicsDBImport)

bash

# Create sample map file
# sample_map.txt:
# sample1    /path/to/sample1.g.vcf.gz
# sample2    /path/to/sample2.g.vcf.gz

gatk GenomicsDBImport \
    --genomicsdb-workspace-path genomicsdb \
    --sample-name-map sample_map.txt \
    -L intervals.interval_list

Alternative: CombineGVCFs (smaller cohorts)

bash

gatk CombineGVCFs \
    -R reference.fa \
    -V sample1.g.vcf.gz \
    -V sample2.g.vcf.gz \
    -V sample3.g.vcf.gz \
    -O cohort.g.vcf.gz

Step 3: Joint Genotyping

bash

# From GenomicsDB
gatk GenotypeGVCFs \
    -R reference.fa \
    -V gendb://genomicsdb \
    -O cohort.vcf.gz

# From combined GVCF
gatk GenotypeGVCFs \
    -R reference.fa \
    -V cohort.g.vcf.gz \
    -O cohort.vcf.gz

Variant Quality Score Recalibration (VQSR)

Machine learning-based filtering using known variant sites. Requires many variants (WGS preferred).

SNP Recalibration

bash

# Build SNP model
gatk VariantRecalibrator \
    -R reference.fa \
    -V cohort.vcf.gz \
    --resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap.vcf.gz \
    --resource:omni,known=false,training=true,truth=false,prior=12.0 omni.vcf.gz \
    --resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G.vcf.gz \
    --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp.vcf.gz \
    -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR \
    -mode SNP \
    -O snp.recal \
    --tranches-file snp.tranches

# Apply SNP filter
gatk ApplyVQSR \
    -R reference.fa \
    -V cohort.vcf.gz \
    -O cohort.snp_recal.vcf.gz \
    --recal-file snp.recal \
    --tranches-file snp.tranches \
    --truth-sensitivity-filter-level 99.5 \
    -mode SNP

Indel Recalibration

bash

# Build Indel model
gatk VariantRecalibrator \
    -R reference.fa \
    -V cohort.snp_recal.vcf.gz \
    --resource:mills,known=false,training=true,truth=true,prior=12.0 Mills.vcf.gz \
    --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp.vcf.gz \
    -an QD -an MQRankSum -an ReadPosRankSum -an FS -an SOR \
    -mode INDEL \
    --max-gaussians 4 \
    -O indel.recal \
    --tranches-file indel.tranches

# Apply Indel filter
gatk ApplyVQSR \
    -R reference.fa \
    -V cohort.snp_recal.vcf.gz \
    -O cohort.vqsr.vcf.gz \
    --recal-file indel.recal \
    --tranches-file indel.tranches \
    --truth-sensitivity-filter-level 99.0 \
    -mode INDEL

Hard Filtering (When VQSR Not Suitable)

For small datasets, exomes, or single samples where VQSR fails.

Extract SNPs and Indels

bash

gatk SelectVariants \
    -R reference.fa \
    -V cohort.vcf.gz \
    --select-type-to-include SNP \
    -O snps.vcf.gz

gatk SelectVariants \
    -R reference.fa \
    -V cohort.vcf.gz \
    --select-type-to-include INDEL \
    -O indels.vcf.gz

Apply Hard Filters

bash

# Filter SNPs
gatk VariantFiltration \
    -R reference.fa \
    -V snps.vcf.gz \
    -O snps.filtered.vcf.gz \
    --filter-expression "QD < 2.0" --filter-name "QD2" \
    --filter-expression "FS > 60.0" --filter-name "FS60" \
    --filter-expression "MQ < 40.0" --filter-name "MQ40" \
    --filter-expression "MQRankSum < -12.5" --filter-name "MQRankSum-12.5" \
    --filter-expression "ReadPosRankSum < -8.0" --filter-name "ReadPosRankSum-8" \
    --filter-expression "SOR > 3.0" --filter-name "SOR3"

# Filter Indels
gatk VariantFiltration \
    -R reference.fa \
    -V indels.vcf.gz \
    -O indels.filtered.vcf.gz \
    --filter-expression "QD < 2.0" --filter-name "QD2" \
    --filter-expression "FS > 200.0" --filter-name "FS200" \
    --filter-expression "ReadPosRankSum < -20.0" --filter-name "ReadPosRankSum-20" \
    --filter-expression "SOR > 10.0" --filter-name "SOR10"

Merge Filtered Variants

bash

gatk MergeVcfs \
    -I snps.filtered.vcf.gz \
    -I indels.filtered.vcf.gz \
    -O cohort.filtered.vcf.gz

Base Quality Score Recalibration (BQSR)

Preprocessing step to correct systematic errors in base quality scores.

Step 1: BaseRecalibrator

bash

gatk BaseRecalibrator \
    -R reference.fa \
    -I sample.bam \
    --known-sites dbsnp.vcf.gz \
    --known-sites known_indels.vcf.gz \
    -O recal_data.table

Step 2: ApplyBQSR

bash

gatk ApplyBQSR \
    -R reference.fa \
    -I sample.bam \
    --bqsr-recal-file recal_data.table \
    -O sample.recal.bam

Parallel Processing

Scatter by Interval

bash

# Split calling across intervals
for interval in chr{1..22} chrX chrY; do
    gatk HaplotypeCaller \
        -R reference.fa \
        -I sample.bam \
        -L $interval \
        -O sample.${interval}.g.vcf.gz \
        -ERC GVCF &
done
wait

# Gather GVCFs
gatk GatherVcfs \
    -I sample.chr1.g.vcf.gz \
    -I sample.chr2.g.vcf.gz \
    ... \
    -O sample.g.vcf.gz

Native Pairwise Parallelism

bash

gatk HaplotypeCaller \
    -R reference.fa \
    -I sample.bam \
    -O sample.vcf.gz \
    --native-pair-hmm-threads 4

CNN Score Variant Filter (Deep Learning)

Alternative to VQSR using convolutional neural network.

Score Variants

bash

gatk CNNScoreVariants \
    -R reference.fa \
    -V cohort.vcf.gz \
    -O cohort.cnn_scored.vcf.gz \
    --tensor-type reference

Filter by CNN Score

bash

gatk FilterVariantTranches \
    -V cohort.cnn_scored.vcf.gz \
    -O cohort.cnn_filtered.vcf.gz \
    --resource hapmap.vcf.gz \
    --resource mills.vcf.gz \
    --info-key CNN_1D \
    --snp-tranche 99.95 \
    --indel-tranche 99.4

Complete Single-Sample Pipeline

bash

#!/bin/bash
SAMPLE=$1
REF=reference.fa
DBSNP=dbsnp.vcf.gz
KNOWN_INDELS=known_indels.vcf.gz

# BQSR
gatk BaseRecalibrator -R $REF -I ${SAMPLE}.bam \
    --known-sites $DBSNP --known-sites $KNOWN_INDELS \
    -O ${SAMPLE}.recal.table

gatk ApplyBQSR -R $REF -I ${SAMPLE}.bam \
    --bqsr-recal-file ${SAMPLE}.recal.table \
    -O ${SAMPLE}.recal.bam

# Call variants
gatk HaplotypeCaller -R $REF -I ${SAMPLE}.recal.bam \
    -O ${SAMPLE}.g.vcf.gz -ERC GVCF

# Single-sample genotyping
gatk GenotypeGVCFs -R $REF -V ${SAMPLE}.g.vcf.gz \
    -O ${SAMPLE}.vcf.gz

# Hard filter
gatk VariantFiltration -R $REF -V ${SAMPLE}.vcf.gz \
    -O ${SAMPLE}.filtered.vcf.gz \
    --filter-expression "QD < 2.0" --filter-name "LowQD" \
    --filter-expression "FS > 60.0" --filter-name "HighFS" \
    --filter-expression "MQ < 40.0" --filter-name "LowMQ"

Key Annotations

Annotation	Description	Good Values
QD	Quality by Depth	> 2.0
FS	Fisher Strand	< 60 (SNP), < 200 (Indel)
SOR	Strand Odds Ratio	< 3 (SNP), < 10 (Indel)
MQ	Mapping Quality	> 40
MQRankSum	MQ Rank Sum Test	> -12.5
ReadPosRankSum	Read Position Rank Sum	> -8.0 (SNP), > -20.0 (Indel)

Resource Files

Resource	Use
dbSNP	Known variants (prior=2.0)
HapMap	Training/truth SNPs (prior=15.0)
Omni	Training SNPs (prior=12.0)
1000G SNPs	Training SNPs (prior=10.0)
Mills Indels	Training/truth indels (prior=12.0)

Related Skills

variant-calling - bcftools alternative
alignment-files - BAM preprocessing
filtering-best-practices - Post-calling filtering
variant-normalization - Normalize before annotation
vep-snpeff-annotation - Annotate final calls

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/variant-interpretation-acmg/bioSkills/gatk-variant-calling
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

GATK Variant Calling

Prerequisites

Single-Sample Calling

Basic HaplotypeCaller

With Standard Annotations

Target Intervals (Exome/Panel)

Adjust Calling Confidence

GVCF Workflow (Recommended for Cohorts)

Step 1: Generate GVCFs per Sample

Step 2: Combine GVCFs (GenomicsDBImport)

Alternative: CombineGVCFs (smaller cohorts)

Step 3: Joint Genotyping

Variant Quality Score Recalibration (VQSR)

SNP Recalibration

Indel Recalibration

Hard Filtering (When VQSR Not Suitable)

Extract SNPs and Indels

Apply Hard Filters

Merge Filtered Variants

Base Quality Score Recalibration (BQSR)

Step 1: BaseRecalibrator

Step 2: ApplyBQSR

Parallel Processing

Scatter by Interval

Native Pairwise Parallelism

CNN Score Variant Filter (Deep Learning)

Score Variants

Filter by CNN Score

Complete Single-Sample Pipeline

Key Annotations

Resource Files

Related Skills

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations