Agent skills
bio-population-genetics-plink-...

Agent skill

bio-population-genetics-plink-basics

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-population-genetics-plink-basics

SKILL.md

name: bio-population-genetics-plink-basics description: PLINK file formats, format conversion, and quality control filtering for population genetics. Convert between VCF, BED/BIM/FAM, and PED/MAP formats, apply MAF, genotyping rate, and HWE filters using PLINK 1.9 and 2.0. Use when working with PLINK format files or running QC. tool_type: cli primary_tool: plink measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

read_file
run_shell_command

PLINK Basics

File formats, conversion, and quality control filtering with PLINK 1.9 and 2.0.

File Formats

Binary Format (Recommended)

File	Contents
`.bed`	Binary genotype data
`.bim`	Variant information (chr, ID, cM, pos, A1, A2)
`.fam`	Sample information (FID, IID, father, mother, sex, pheno)

PLINK 2.0 Format

File	Contents
`.pgen`	Binary genotype data (compressed)
`.pvar`	Variant information
`.psam`	Sample information

Text Format (Legacy)

File	Contents
`.ped`	Genotypes (FID, IID, father, mother, sex, pheno, genotypes)
`.map`	Variant positions (chr, ID, cM, pos)

Format Conversion

VCF to PLINK Binary

bash

# PLINK 1.9
plink --vcf input.vcf.gz --make-bed --out output

# PLINK 2.0
plink2 --vcf input.vcf.gz --make-bed --out output

# With sample ID handling
plink2 --vcf input.vcf.gz --double-id --make-bed --out output

PLINK Binary to VCF

bash

# PLINK 1.9
plink --bfile input --recode vcf --out output

# PLINK 2.0
plink2 --bfile input --export vcf --out output

# Compressed VCF
plink2 --bfile input --export vcf bgz --out output

PED/MAP to Binary (PLINK 1.9 Only)

bash

# PLINK 1.9 (PLINK 2.0 doesn't support .ped/.map directly)
plink --file input --make-bed --out output

Binary to PED/MAP

bash

# PLINK 1.9
plink --bfile input --recode --out output

# PLINK 2.0
plink2 --bfile input --export ped --out output

PLINK 1.9 to 2.0 Format

bash

# Convert to PGEN format
plink2 --bfile input --make-pgen --out output

# Convert back to BED
plink2 --pfile input --make-bed --out output

Quality Control Filtering

MAF Filter (Minor Allele Frequency)

bash

# Remove variants with MAF < 0.01
plink --bfile input --maf 0.01 --make-bed --out output

# PLINK 2.0
plink2 --bfile input --maf 0.01 --make-bed --out output

# Remove rare variants (MAF < 0.05)
plink2 --bfile input --maf 0.05 --make-bed --out output

Genotyping Rate Filters

bash

# Per-variant missing rate (remove if >5% missing)
plink2 --bfile input --geno 0.05 --make-bed --out output

# Per-sample missing rate (remove if >5% missing)
plink2 --bfile input --mind 0.05 --make-bed --out output

Hardy-Weinberg Equilibrium Filter

bash

# Remove variants with HWE p-value < 1e-6
plink2 --bfile input --hwe 1e-6 --make-bed --out output

# Different threshold for cases vs controls
plink2 --bfile input --hwe 1e-6 --hwe-all --make-bed --out output

Combined QC Pipeline

bash

# Standard QC filtering
plink2 --bfile input \
    --maf 0.01 \
    --geno 0.05 \
    --mind 0.05 \
    --hwe 1e-6 \
    --make-bed --out qc_filtered

Sample and Variant Selection

Keep/Remove Samples

bash

# Keep specific samples (samples.txt: FID IID per line)
plink2 --bfile input --keep samples.txt --make-bed --out output

# Remove specific samples
plink2 --bfile input --remove samples.txt --make-bed --out output

# Keep single sample
plink2 --bfile input --keep-fam sample_id --make-bed --out output

Extract/Exclude Variants

bash

# Extract specific variants (variants.txt: variant IDs)
plink2 --bfile input --extract variants.txt --make-bed --out output

# Exclude specific variants
plink2 --bfile input --exclude variants.txt --make-bed --out output

# Extract by range
plink2 --bfile input --extract range chr1:1000000-2000000 --make-bed --out output

Chromosome Selection

bash

# Single chromosome
plink2 --bfile input --chr 22 --make-bed --out chr22

# Multiple chromosomes
plink2 --bfile input --chr 1-22 --make-bed --out autosomes

# Exclude chromosome
plink2 --bfile input --not-chr 23,24,25,26 --make-bed --out autosomes

Allele Frequency

bash

# PLINK 1.9 (MAF-based)
plink --bfile input --freq --out output

# PLINK 2.0 (ALT allele frequency - not MAF!)
plink2 --bfile input --freq --out output

# PLINK 2.0 with MAF
plink2 --bfile input --freq cols=+mac,+mafreq --out output

Missing Data Statistics

bash

# Per-sample and per-variant missing rates
plink2 --bfile input --missing --out output

# Output files:
# output.smiss - sample missing rates
# output.vmiss - variant missing rates

Sex Check

Verify reported sex matches X chromosome heterozygosity.

bash

# PLINK 1.9
plink --bfile input --check-sex --out sex_check

# PLINK 2.0
plink2 --bfile input --split-par hg38 --check-sex --out sex_check

Interpret Results

python

import pandas as pd

sex = pd.read_csv('sex_check.sexcheck', sep='\s+')

problems = sex[sex['STATUS'] == 'PROBLEM']
print(f'Sex mismatches: {len(problems)}')

# F statistic: <0.2 = female, >0.8 = male, between = ambiguous
# PEDSEX: reported sex (1=male, 2=female, 0=unknown)
# SNPSEX: inferred sex (1=male, 2=female, 0=undetermined)

Update or Remove

bash

# Update sex from check results
plink2 --bfile input --update-sex sex_check.sexcheck col-num=4 --make-bed --out updated

# Remove sex mismatches
awk '$5 == "PROBLEM" {print $1, $2}' sex_check.sexcheck > sex_problems.txt
plink2 --bfile input --remove sex_problems.txt --make-bed --out output

Sample Information

Update Phenotypes

bash

# phenotypes.txt: FID IID pheno (1=control, 2=case, -9=missing)
plink2 --bfile input --pheno phenotypes.txt --make-bed --out output

# Quantitative phenotype
plink2 --bfile input --pheno phenotypes.txt --make-bed --out output

Update Sex

bash

# sex.txt: FID IID sex (1=male, 2=female, 0=unknown)
plink2 --bfile input --update-sex sex.txt --make-bed --out output

Update Sample IDs

bash

# ids.txt: old_FID old_IID new_FID new_IID
plink2 --bfile input --update-ids ids.txt --make-bed --out output

Merging Datasets

bash

# Merge two datasets (PLINK 1.9)
plink --bfile data1 --bmerge data2.bed data2.bim data2.fam --make-bed --out merged

# Merge list of datasets
plink --bfile data1 --merge-list merge_list.txt --make-bed --out merged
# merge_list.txt contains: data2.bed data2.bim data2.fam (one set per line)

# Handle strand flips
plink --bfile data1 --bmerge data2 --make-bed --out merged
# If error: plink --bfile data2 --flip missnps.txt --make-bed --out data2_flipped

Variant Information

Set Variant IDs

bash

# Set ID based on position
plink2 --bfile input --set-all-var-ids @:#:\$r:\$a --make-bed --out output
# Format: chr:pos:ref:alt

Update Variant Names

bash

# update.txt: old_id new_id
plink2 --bfile input --update-name update.txt --make-bed --out output

PLINK 2.0 vs 1.9 Summary

Feature	PLINK 2.0	PLINK 1.9
Status	Current	Legacy
Command	`plink2`	`plink`
Format	`.pgen/.pvar/.psam`	`.bed/.bim/.fam`
Speed	Faster	Baseline
Memory	More efficient	Higher for large data
Export VCF	`--export vcf`	`--recode vcf`
Frequency output	ALT frequency	MAF
Missing output	`.smiss/.vmiss`	`.imiss/.lmiss`
PED/MAP support	No (convert via 1.9)	Yes (`--file`)

Related Skills

association-testing - GWAS with filtered data
population-structure - PCA after QC
variant-calling/vcf-basics - VCF format before conversion

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/bio-population-genetics-plink-basics
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

PLINK Basics

File Formats

Binary Format (Recommended)

PLINK 2.0 Format

Text Format (Legacy)

Format Conversion

VCF to PLINK Binary

PLINK Binary to VCF

PED/MAP to Binary (PLINK 1.9 Only)

Binary to PED/MAP

PLINK 1.9 to 2.0 Format

Quality Control Filtering

MAF Filter (Minor Allele Frequency)

Genotyping Rate Filters

Hardy-Weinberg Equilibrium Filter

Combined QC Pipeline

Sample and Variant Selection

Keep/Remove Samples

Extract/Exclude Variants

Chromosome Selection

Allele Frequency

Missing Data Statistics

Sex Check

Interpret Results

Update or Remove

Sample Information

Update Phenotypes

Update Sex

Update Sample IDs

Merging Datasets

Variant Information

Set Variant IDs

Update Variant Names

PLINK 2.0 vs 1.9 Summary

Related Skills

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations