Agent skill

bio-codon-usage

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-codon-usage

SKILL.md

name: bio-codon-usage description: Analyze codon usage, calculate CAI (Codon Adaptation Index), and examine synonymous codon bias using Biopython. Use when analyzing coding sequences for expression optimization or evolutionary analysis. tool_type: python primary_tool: Bio.SeqUtils.CodonUsage measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

read_file
run_shell_command

Codon Usage

Analyze codon usage patterns and calculate codon adaptation metrics using Biopython.

Required Imports

python

from Bio.Seq import Seq
from Bio.SeqUtils import GC123
from Bio.SeqUtils.CodonUsage import CodonAdaptationIndex
from Bio.Data import CodonTable
from collections import Counter

Basic Codon Counting

Count Codons in Sequence

python

from collections import Counter

def count_codons(seq):
    seq_str = str(seq).upper()
    codons = [seq_str[i:i+3] for i in range(0, len(seq_str) - 2, 3)]
    return Counter(codons)

seq = Seq('ATGCGATCGATCGATCGTAA')
codon_counts = count_codons(seq)

Codon Frequencies (Relative)

python

def codon_frequencies(seq):
    counts = count_codons(seq)
    total = sum(counts.values())
    return {codon: count / total for codon, count in counts.items()}

Codon Adaptation Index (CAI)

Using CodonUsage Module

python

from Bio.SeqUtils.CodonUsage import CodonAdaptationIndex

# Create CAI calculator with reference set
cai = CodonAdaptationIndex()

# Generate index from highly expressed genes
cai.generate_index('highly_expressed_genes.fasta')

# Calculate CAI for a sequence
seq = Seq('ATGCGATCGATCGATCGTAA')
cai_value = cai.cai_for_gene(str(seq))
print(f'CAI: {cai_value:.3f}')  # Range 0-1, higher = better adapted

CAI with Custom Codon Index

python

from Bio.SeqUtils.CodonUsage import CodonAdaptationIndex

cai = CodonAdaptationIndex()

# Set custom index (relative adaptiveness for each codon)
custom_index = {
    'TTT': 0.5, 'TTC': 1.0,  # Phe
    'TTA': 0.1, 'TTG': 0.5, 'CTT': 0.3, 'CTC': 1.0, 'CTA': 0.1, 'CTG': 1.0,  # Leu
    # ... define all 64 codons
}
cai.set_cai_index(custom_index)

Synonymous Codon Usage

RSCU (Relative Synonymous Codon Usage)

RSCU = (observed codon frequency) / (expected frequency if all synonymous codons were used equally)

python

from Bio.Data import CodonTable

def calculate_rscu(seq, table_id=1):
    codon_table = CodonTable.unambiguous_dna_by_id[table_id]
    counts = count_codons(seq)

    # Group codons by amino acid
    aa_to_codons = {}
    for codon in counts:
        if codon in codon_table.stop_codons:
            continue
        try:
            aa = codon_table.forward_table[codon]
            aa_to_codons.setdefault(aa, []).append(codon)
        except KeyError:
            continue

    # Calculate RSCU for each codon
    rscu = {}
    for aa, codons in aa_to_codons.items():
        total = sum(counts.get(c, 0) for c in codons)
        n_synonymous = len(codons)
        expected = total / n_synonymous if n_synonymous > 0 else 0
        for codon in codons:
            observed = counts.get(codon, 0)
            rscu[codon] = observed / expected if expected > 0 else 0
    return rscu

Identify Rare Codons

python

def find_rare_codons(seq, threshold=0.1):
    freq = codon_frequencies(seq)
    return {codon: f for codon, f in freq.items() if f < threshold}

Codon Bias by Position (GC123)

python

from Bio.SeqUtils import GC123

seq = Seq('ATGCGATCGATCGATCGATCGATCGATCGTAA')
gc_total, gc_pos1, gc_pos2, gc_pos3 = GC123(seq)

print(f'Total GC: {gc_total:.1f}%')
print(f'1st position GC: {gc_pos1:.1f}%')
print(f'2nd position GC: {gc_pos2:.1f}%')
print(f'3rd position GC: {gc_pos3:.1f}% (wobble position)')

Codon Tables

Access Codon Tables

python

from Bio.Data import CodonTable

# Get standard table
std_table = CodonTable.unambiguous_dna_by_id[1]

# List all available tables
for id, table in CodonTable.unambiguous_dna_by_id.items():
    print(f'{id}: {table.names[0]}')

Common Codon Tables

ID	Name	Organism
1	Standard	Most organisms
2	Vertebrate Mitochondrial	Human, mouse mito
4	Mold Mitochondrial	Fungi, protozoa mito
5	Invertebrate Mitochondrial	Insects, worms mito
11	Bacterial/Plastid	E. coli, chloroplasts

Codon Table Properties

python

table = CodonTable.unambiguous_dna_by_id[1]

print(f'Start codons: {table.start_codons}')
print(f'Stop codons: {table.stop_codons}')

# Forward table: codon -> amino acid
print(table.forward_table['ATG'])  # 'M'

# Back table: amino acid -> list of codons
back_table = {}
for codon, aa in table.forward_table.items():
    back_table.setdefault(aa, []).append(codon)
print(f'Leucine codons: {back_table["L"]}')

Code Patterns

Full Codon Usage Report

python

def codon_usage_report(seq, table_id=1):
    from Bio.Data import CodonTable

    table = CodonTable.unambiguous_dna_by_id[table_id]
    counts = count_codons(seq)
    total = sum(counts.values())

    # Group by amino acid
    aa_groups = {}
    for codon, aa in table.forward_table.items():
        aa_groups.setdefault(aa, []).append(codon)

    report = {}
    for aa, codons in sorted(aa_groups.items()):
        aa_total = sum(counts.get(c, 0) for c in codons)
        report[aa] = {
            'total': aa_total,
            'codons': {c: {'count': counts.get(c, 0),
                          'freq': counts.get(c, 0) / aa_total if aa_total > 0 else 0}
                      for c in codons}
        }
    return report

Compare Codon Usage Between Sequences

python

def compare_codon_usage(seq1, seq2):
    freq1 = codon_frequencies(seq1)
    freq2 = codon_frequencies(seq2)

    all_codons = set(freq1.keys()) | set(freq2.keys())
    comparison = {}
    for codon in sorted(all_codons):
        f1, f2 = freq1.get(codon, 0), freq2.get(codon, 0)
        comparison[codon] = {'seq1': f1, 'seq2': f2, 'diff': f1 - f2}
    return comparison

Optimize Codons for Expression

python

def optimize_codons(protein_seq, preferred_codons):
    '''Replace codons with preferred synonymous codons'''
    optimized = []
    for aa in str(protein_seq):
        if aa in preferred_codons:
            optimized.append(preferred_codons[aa])
        else:
            optimized.append('NNN')  # Unknown
    return Seq(''.join(optimized))

# E. coli preferred codons
ecoli_preferred = {
    'A': 'GCG', 'R': 'CGT', 'N': 'AAC', 'D': 'GAT', 'C': 'TGC',
    'Q': 'CAG', 'E': 'GAA', 'G': 'GGT', 'H': 'CAC', 'I': 'ATT',
    'L': 'CTG', 'K': 'AAA', 'M': 'ATG', 'F': 'TTC', 'P': 'CCG',
    'S': 'TCT', 'T': 'ACC', 'W': 'TGG', 'Y': 'TAC', 'V': 'GTT',
}

Codon Usage from FASTA File

python

from Bio import SeqIO

def analyze_fasta_codon_usage(filename):
    all_counts = Counter()
    for record in SeqIO.parse(filename, 'fasta'):
        all_counts.update(count_codons(record.seq))

    total = sum(all_counts.values())
    return {codon: count / total for codon, count in all_counts.items()}

Effective Number of Codons (Nc)

A measure of codon bias (lower = more biased, range 20-61):

python

import math

def effective_nc(seq, table_id=1):
    from Bio.Data import CodonTable
    table = CodonTable.unambiguous_dna_by_id[table_id]
    counts = count_codons(seq)

    # Group by degeneracy class
    aa_groups = {}
    for codon, aa in table.forward_table.items():
        aa_groups.setdefault(aa, []).append(codon)

    # Calculate F for each amino acid
    nc_sum = 0
    for aa, codons in aa_groups.items():
        n = sum(counts.get(c, 0) for c in codons)
        if n <= 1:
            continue
        pi_sq_sum = sum((counts.get(c, 0) / n) ** 2 for c in codons)
        F = (n * pi_sq_sum - 1) / (n - 1)
        nc_sum += 1 / F if F > 0 else len(codons)

    return nc_sum if nc_sum > 0 else 61

Property Reference

Metric	Range	Interpretation
CAI	0-1	Higher = better adapted to host
RSCU	0-N	1.0 = no bias, >1 = overused, <1 = underused
Nc	20-61	Lower = more biased
GC3	0-100%	GC at wobble position

Common Errors

Error	Cause	Solution
`KeyError`	Non-standard codon	Handle N-containing codons
Wrong counts	Sequence not in frame	Ensure length is multiple of 3
No index set	Called CAI without training	Call `generate_index()` first

Decision Tree

Need to analyze codon usage?
├── Count codon frequencies?
│   └── Use Counter on 3-mers
├── Calculate adaptation to host?
│   └── Use CodonAdaptationIndex (CAI)
├── Identify synonymous bias?
│   └── Calculate RSCU
├── Check wobble position bias?
│   └── Use GC123()
├── Measure overall bias?
│   └── Calculate Nc (effective number of codons)
└── Optimize for expression?
    └── Replace with preferred synonymous codons

Related Skills

transcription-translation - Translate sequences and understand codon tables
sequence-properties - GC123 for wobble position GC content
sequence-io/read-sequences - Parse CDS sequences from GenBank files
database-access/entrez-fetch - Fetch reference gene sets from NCBI for CAI training

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/bio-codon-usage
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Codon Usage

Required Imports

Basic Codon Counting

Count Codons in Sequence

Codon Frequencies (Relative)

Codon Adaptation Index (CAI)

Using CodonUsage Module

CAI with Custom Codon Index

Synonymous Codon Usage

RSCU (Relative Synonymous Codon Usage)

Identify Rare Codons

Codon Bias by Position (GC123)

Codon Tables

Access Codon Tables

Common Codon Tables

Codon Table Properties

Code Patterns

Full Codon Usage Report

Compare Codon Usage Between Sequences

Optimize Codons for Expression

Codon Usage from FASTA File

Effective Number of Codons (Nc)

Property Reference

Common Errors

Decision Tree

Related Skills

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations