Agent skills
bio-workflows-gwas-pipeline

Agent skill

bio-workflows-gwas-pipeline

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-workflows-gwas-pipeline

SKILL.md

name: bio-workflows-gwas-pipeline description: End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies. tool_type: mixed primary_tool: PLINK2 workflow: true depends_on:

population-genetics/plink-basics
population-genetics/population-structure
population-genetics/association-testing
population-genetics/linkage-disequilibrium qc_checkpoints:
after_qc: "Sample/variant call rates >95%, HWE p>1e-6"
after_structure: "No population stratification bias"
after_association: "Lambda ~1.0, expected QQ plot" measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
read_file
run_shell_command

GWAS Pipeline

Complete workflow for genome-wide association studies from genotype data to significant associations.

Workflow Overview

VCF/PLINK files
    |
    v
[1. QC Filtering] ------> Sample and variant QC
    |
    v
[2. LD Pruning] --------> Independent variants for PCA
    |
    v
[3. Population Structure] --> PCA for covariates
    |
    v
[4. Association Testing] --> Logistic/linear regression
    |
    v
[5. Results] -----------> Manhattan plot, QQ plot
    |
    v
Significant associations

Step 1: Data Import and QC

Convert VCF to PLINK

bash

# VCF to PLINK binary format
plink2 --vcf input.vcf.gz \
    --make-bed \
    --out study

# Or with phenotype/covariate files
plink2 --vcf input.vcf.gz \
    --pheno phenotypes.txt \
    --make-bed \
    --out study

Sample QC

bash

# Calculate sample statistics
plink2 --bfile study \
    --missing \
    --out study_stats

# Remove samples with high missing rate (>5%)
plink2 --bfile study \
    --mind 0.05 \
    --make-bed \
    --out study_sample_qc

# Check for sex discrepancies (if sex chromosome data available)
plink2 --bfile study_sample_qc \
    --check-sex \
    --out study_sex_check

# Remove related individuals (optional, requires IBD)
plink2 --bfile study_sample_qc \
    --king-cutoff 0.0884 \
    --make-bed \
    --out study_unrelated

Variant QC

bash

# Apply standard variant filters
plink2 --bfile study_sample_qc \
    --geno 0.05 \
    --maf 0.01 \
    --hwe 1e-6 \
    --make-bed \
    --out study_qc

# Summary
plink2 --bfile study_qc --freq --out study_qc

QC Checkpoint:

Sample call rate >95%
Variant call rate >95%
MAF >1%
HWE p-value >1e-6 (controls only for case-control)

Step 2: LD Pruning for PCA

bash

# Identify independent variants
plink2 --bfile study_qc \
    --indep-pairwise 50 5 0.2 \
    --out pruned

# Extract pruned variants
plink2 --bfile study_qc \
    --extract pruned.prune.in \
    --make-bed \
    --out study_pruned

Step 3: Population Structure (PCA)

bash

# Calculate principal components
plink2 --bfile study_pruned \
    --pca 10 \
    --out study_pca

# The eigenvec file contains PCs for use as covariates

Visualize PCA

library(ggplot2)

# Load PCA results
pca <- read.table('study_pca.eigenvec', header = FALSE)
colnames(pca) <- c('FID', 'IID', paste0('PC', 1:10))

# Load phenotype for coloring
pheno <- read.table('phenotypes.txt', header = TRUE)
pca <- merge(pca, pheno, by = c('FID', 'IID'))

# Plot
ggplot(pca, aes(x = PC1, y = PC2, color = as.factor(PHENO))) +
    geom_point(alpha = 0.5) +
    labs(title = 'PCA of Study Samples', color = 'Phenotype') +
    theme_minimal()
ggsave('pca_plot.pdf', width = 8, height = 6)

Step 4: Association Testing

Case-Control (Binary Trait)

bash

# Logistic regression with PCA covariates
plink2 --bfile study_qc \
    --pheno phenotypes.txt \
    --covar study_pca.eigenvec \
    --covar-col-nums 3-12 \
    --glm hide-covar \
    --out gwas_results

# Results in gwas_results.PHENO.glm.logistic

Quantitative Trait

bash

# Linear regression
plink2 --bfile study_qc \
    --pheno phenotypes.txt \
    --pheno-name BMI \
    --covar study_pca.eigenvec \
    --covar-col-nums 3-12 \
    --glm hide-covar \
    --out gwas_bmi

# Results in gwas_bmi.BMI.glm.linear

With Additional Covariates

bash

# Include age, sex, and PCs
plink2 --bfile study_qc \
    --pheno phenotypes.txt \
    --covar covariates.txt \
    --covar-name AGE,SEX,PC1-PC10 \
    --glm hide-covar \
    --out gwas_adjusted

Step 5: Results Visualization

Manhattan Plot

library(qqman)

# Load results
results <- read.table('gwas_results.PHENO.glm.logistic', header = TRUE)
results <- results[!is.na(results$P),]

# Manhattan plot
png('manhattan.png', width = 1200, height = 600)
manhattan(results, chr = 'X.CHROM', bp = 'POS', snp = 'ID', p = 'P',
          suggestiveline = -log10(1e-5), genomewideline = -log10(5e-8))
dev.off()

# QQ plot
png('qq_plot.png', width = 600, height = 600)
qq(results$P)
dev.off()

Calculate Genomic Inflation

# Lambda (genomic inflation factor)
chisq <- qchisq(1 - results$P, 1)
lambda <- median(chisq) / qchisq(0.5, 1)
cat('Lambda:', round(lambda, 3), '\n')
# Lambda should be close to 1.0 (1.0-1.1 acceptable)

Extract Significant Hits

bash

# Genome-wide significant (p < 5e-8)
awk '$12 < 5e-8' gwas_results.PHENO.glm.logistic > significant_hits.txt

# Suggestive (p < 1e-5)
awk '$12 < 1e-5' gwas_results.PHENO.glm.logistic > suggestive_hits.txt

Parameter Recommendations

Step	Parameter	Value
Sample QC	--mind	0.05
Variant QC	--geno	0.05
Variant QC	--maf	0.01
Variant QC	--hwe	1e-6
LD pruning	--indep-pairwise	50 5 0.2
PCA	--pca	10
Significance	p-value	5e-8

Troubleshooting

Issue	Likely Cause	Solution
High lambda (>1.1)	Population stratification	Add more PCs, check ancestry
No significant hits	Low power	Increase sample size, meta-analysis
Deflated lambda (<1)	Over-correction	Reduce PC covariates
QQ deviation at low end	Batch effects	Check for technical artifacts

Complete Pipeline Script

bash

#!/bin/bash
set -e

INPUT_VCF="genotypes.vcf.gz"
PHENO="phenotypes.txt"
OUTDIR="gwas_results"

mkdir -p ${OUTDIR}

# Step 1: Convert and QC
plink2 --vcf ${INPUT_VCF} --make-bed --out ${OUTDIR}/raw
plink2 --bfile ${OUTDIR}/raw --mind 0.05 --geno 0.05 --maf 0.01 --hwe 1e-6 \
    --make-bed --out ${OUTDIR}/qc

# Step 2: LD pruning
plink2 --bfile ${OUTDIR}/qc --indep-pairwise 50 5 0.2 --out ${OUTDIR}/pruned
plink2 --bfile ${OUTDIR}/qc --extract ${OUTDIR}/pruned.prune.in \
    --make-bed --out ${OUTDIR}/pruned

# Step 3: PCA
plink2 --bfile ${OUTDIR}/pruned --pca 10 --out ${OUTDIR}/pca

# Step 4: Association
plink2 --bfile ${OUTDIR}/qc --pheno ${PHENO} \
    --covar ${OUTDIR}/pca.eigenvec --covar-col-nums 3-12 \
    --glm hide-covar --out ${OUTDIR}/gwas

echo "=== GWAS Complete ==="
echo "Results: ${OUTDIR}/gwas.*.glm.*"

Related Skills

population-genetics/plink-basics - PLINK file formats and commands
population-genetics/population-structure - PCA and admixture
population-genetics/association-testing - Statistical models
population-genetics/linkage-disequilibrium - LD concepts

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/bio-workflows-gwas-pipeline
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

GWAS Pipeline

Workflow Overview

Step 1: Data Import and QC

Convert VCF to PLINK

Sample QC

Variant QC

Step 2: LD Pruning for PCA

Step 3: Population Structure (PCA)

Visualize PCA

Step 4: Association Testing

Case-Control (Binary Trait)

Quantitative Trait

With Additional Covariates

Step 5: Results Visualization

Manhattan Plot

Calculate Genomic Inflation

Extract Significant Hits

Parameter Recommendations

Troubleshooting

Complete Pipeline Script

Related Skills

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations