Agent skills
tooluniverse-protein-therapeut...

Agent skill

tooluniverse-protein-therapeutic-design

Design novel protein therapeutics (binders, enzymes, scaffolds) using AI-guided de novo design. Uses RFdiffusion for backbone generation, ProteinMPNN for sequence design, ESMFold/AlphaFold2 for validation. Use when asked to design protein binders, therapeutic proteins, or engineer protein function.

View SKILL.md on GitHub Repository

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/tooluniverse-protein-therapeutic-design

SKILL.md

Therapeutic Protein Designer

AI-guided de novo protein design using RFdiffusion backbone generation, ProteinMPNN sequence optimization, and structure validation for therapeutic protein development.

KEY PRINCIPLES:

Structure-first design - Generate backbone geometry before sequence
Target-guided - Design binders with target structure in mind
Iterative validation - Predict structure to validate designs
Developability-aware - Consider aggregation, immunogenicity, expression
Evidence-graded - Grade designs by confidence metrics
Actionable output - Provide sequences ready for experimental testing
English-first queries - Always use English terms in tool calls (protein names, target names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language

When to Use

Apply when user asks:

"Design a protein binder for [target]"
"Create a therapeutic protein against [protein/epitope]"
"Design a protein scaffold with [property]"
"Optimize this protein sequence for [function]"
"Design a de novo enzyme for [reaction]"
"Generate protein variants for [target binding]"

Critical Workflow Requirements

1. Report-First Approach (MANDATORY)

Create the report file FIRST:
- File name: [TARGET]_protein_design_report.md
- Initialize with section headers
- Add placeholder: [Designing...]
Progressively update as designs are generated
Output separate files:
- [TARGET]_designed_sequences.fasta - All designed sequences
- [TARGET]_top_candidates.csv - Ranked candidates with metrics

2. Design Documentation (MANDATORY)

Every design MUST include:

markdown

### Design: Binder_001

**Sequence**: MVLSPADKTN...
**Length**: 85 amino acids
**Target**: PD-L1 (UniProt: Q9NZQ7)
**Method**: RFdiffusion → ProteinMPNN → ESMFold validation

**Quality Metrics**:
| Metric | Value | Interpretation |
|--------|-------|----------------|
| pLDDT | 88.5 | High confidence |
| pTM | 0.82 | Good fold |
| ProteinMPNN score | -2.3 | Favorable |
| Predicted binding | Strong | Based on interface pLDDT |

*Source: NVIDIA NIM via `NvidiaNIM_rfdiffusion`, `NvidiaNIM_proteinmpnn`, `NvidiaNIM_esmfold`*

Phase 0: Tool Verification

NVIDIA NIM Tools Required

Tool	Purpose	API Key Required
`NvidiaNIM_rfdiffusion`	Backbone generation	Yes
`NvidiaNIM_proteinmpnn`	Sequence design	Yes
`NvidiaNIM_esmfold`	Fast structure validation	Yes
`NvidiaNIM_alphafold2`	High-accuracy validation	Yes
`NvidiaNIM_esm2_650m`	Sequence embeddings	Yes

Parameter Verification

Tool	WRONG Parameter	CORRECT Parameter
`NvidiaNIM_rfdiffusion`	`num_steps`	`diffusion_steps`
`NvidiaNIM_proteinmpnn`	`pdb`	`pdb_string`
`NvidiaNIM_esmfold`	`seq`	`sequence`

Workflow Overview

Phase 1: Target Characterization
├── Get target structure (PDB, EMDB cryo-EM, or AlphaFold)
├── Identify binding epitope
├── Analyze existing binders
├── Check EMDB for membrane protein structures (NEW)
└── OUTPUT: Target profile
    ↓
Phase 2: Backbone Generation (RFdiffusion)
├── Define design constraints
├── Generate multiple backbones
├── Filter by geometry quality
└── OUTPUT: Candidate backbones
    ↓
Phase 3: Sequence Design (ProteinMPNN)
├── Design sequences for each backbone
├── Sample multiple sequences per backbone
├── Score by ProteinMPNN likelihood
└── OUTPUT: Designed sequences
    ↓
Phase 4: Structure Validation
├── Predict structure (ESMFold/AlphaFold2)
├── Compare to designed backbone
├── Assess fold quality (pLDDT, pTM)
└── OUTPUT: Validated designs
    ↓
Phase 5: Developability Assessment
├── Aggregation propensity
├── Expression likelihood
├── Immunogenicity prediction
└── OUTPUT: Developability scores
    ↓
Phase 6: Report Synthesis
├── Ranked candidate list
├── Experimental recommendations
├── Next steps
└── OUTPUT: Final report

Phase 1: Target Characterization

1.1 Get Target Structure

python

def get_target_structure(tu, target_id):
    """Get target structure from PDB, EMDB, or predict."""
    
    # Try PDB first (X-ray/NMR)
    pdb_results = tu.tools.PDB_search_by_uniprot(uniprot_id=target_id)
    
    if pdb_results:
        # Get highest resolution structure
        best_pdb = sorted(pdb_results, key=lambda x: x['resolution'])[0]
        structure = tu.tools.PDB_get_structure(pdb_id=best_pdb['pdb_id'])
        return {'source': 'PDB', 'pdb_id': best_pdb['pdb_id'], 
                'resolution': best_pdb['resolution'], 'structure': structure}
    
    # Try EMDB for cryo-EM structures (valuable for membrane proteins)
    protein_info = tu.tools.UniProt_get_protein_by_accession(accession=target_id)
    emdb_results = tu.tools.emdb_search(
        query=protein_info['proteinDescription']['recommendedName']['fullName']['value']
    )
    
    if emdb_results and len(emdb_results) > 0:
        # Get highest resolution cryo-EM entry
        best_emdb = sorted(emdb_results, key=lambda x: x.get('resolution', 99))[0]
        # Get associated PDB model if available
        emdb_details = tu.tools.emdb_get_entry(entry_id=best_emdb['emdb_id'])
        if emdb_details.get('pdb_ids'):
            structure = tu.tools.PDB_get_structure(pdb_id=emdb_details['pdb_ids'][0])
            return {'source': 'EMDB cryo-EM', 'emdb_id': best_emdb['emdb_id'],
                    'pdb_id': emdb_details['pdb_ids'][0], 
                    'resolution': best_emdb.get('resolution'), 'structure': structure}
    
    # Fallback to AlphaFold prediction
    sequence = tu.tools.UniProt_get_protein_sequence(accession=target_id)
    structure = tu.tools.NvidiaNIM_alphafold2(
        sequence=sequence['sequence'],
        algorithm="mmseqs2"
    )
    return {'source': 'AlphaFold2 (predicted)', 'structure': structure}

1.1b EMDB for Membrane Proteins (NEW)

When to prioritize EMDB: Membrane proteins, large complexes, and targets where conformational states matter.

python

def get_cryoem_structures(tu, target_name):
    """Get cryo-EM structures for membrane proteins/complexes."""
    
    # Search EMDB
    emdb_results = tu.tools.emdb_search(
        query=f"{target_name} membrane OR receptor"
    )
    
    structures = []
    for entry in emdb_results[:5]:
        details = tu.tools.emdb_get_entry(entry_id=entry['emdb_id'])
        structures.append({
            'emdb_id': entry['emdb_id'],
            'resolution': entry.get('resolution', 'N/A'),
            'title': entry.get('title', 'N/A'),
            'conformational_state': details.get('state', 'Unknown'),
            'pdb_models': details.get('pdb_ids', [])
        })
    
    return structures

Output for Report:

markdown

### 1.1b Cryo-EM Structures (EMDB)

| EMDB ID | Resolution | PDB Model | Conformation |
|---------|------------|-----------|--------------|
| EMD-12345 | 2.8 Å | 7ABC | Active state |
| EMD-23456 | 3.1 Å | 8DEF | Inactive state |

**Note**: Cryo-EM structures capture physiologically relevant conformations for membrane protein targets.

*Source: EMDB*

1.2 Identify Binding Epitope

python

def identify_epitope(tu, target_structure, epitope_residues=None):
    """Identify or validate binding epitope."""
    
    if epitope_residues:
        # User-specified epitope
        return {'residues': epitope_residues, 'source': 'user-defined'}
    
    # Find surface-exposed regions
    # Use structural analysis to identify potential epitopes
    return analyze_surface(target_structure)

1.3 Output for Report

markdown

## 1. Target Characterization

### 1.1 Target Information

| Property | Value |
|----------|-------|
| **Target** | PD-L1 (Programmed death-ligand 1) |
| **UniProt** | Q9NZQ7 |
| **Structure source** | PDB: 4ZQK (2.0 Å resolution) |
| **Binding epitope** | IgV domain, residues 19-127 |
| **Known binders** | Atezolizumab, durvalumab, avelumab |

### 1.2 Epitope Analysis

| Residue Range | Type | Surface Area | Druggability |
|---------------|------|--------------|--------------|
| 54-68 | Loop | 850 Å² | High |
| 115-125 | Beta strand | 420 Å² | Medium |
| 19-30 | N-terminus | 380 Å² | Medium |

**Selected Epitope**: Residues 54-68 (PD-1 binding interface)

*Source: PDB 4ZQK, surface analysis*

Phase 2: Backbone Generation

2.1 RFdiffusion Design

python

def generate_backbones(tu, design_params):
    """Generate de novo backbones using RFdiffusion."""
    
    backbones = tu.tools.NvidiaNIM_rfdiffusion(
        diffusion_steps=design_params.get('steps', 50),
        # Additional parameters depending on design type
    )
    
    return backbones

2.2 Design Modes

Mode	Use Case	Key Parameters
Unconditional	De novo scaffold	`diffusion_steps` only
Binder design	Target-guided binder	`target_structure`, `hotspot_residues`
Motif scaffolding	Functional motif embedding	`motif_sequence`, `motif_structure`

2.3 Output for Report

markdown

## 2. Backbone Generation

### 2.1 Design Parameters

| Parameter | Value |
|-----------|-------|
| **Method** | RFdiffusion via NVIDIA NIM |
| **Design mode** | Unconditional scaffold generation |
| **Diffusion steps** | 50 |
| **Number generated** | 10 backbones |

### 2.2 Generated Backbones

| Backbone | Length | Topology | Quality |
|----------|--------|----------|---------|
| BB_001 | 85 aa | 3-helix bundle | Good |
| BB_002 | 92 aa | Beta sandwich | Good |
| BB_003 | 78 aa | Alpha-beta | Good |
| BB_004 | 88 aa | All-alpha | Moderate |
| BB_005 | 95 aa | Mixed | Good |

**Selected for sequence design**: BB_001, BB_002, BB_003, BB_005 (top 4)

*Source: NVIDIA NIM via `NvidiaNIM_rfdiffusion`*

Phase 3: Sequence Design

3.1 ProteinMPNN Design

python

def design_sequences(tu, backbone_pdb, num_sequences=8):
    """Design sequences for backbone using ProteinMPNN."""
    
    sequences = tu.tools.NvidiaNIM_proteinmpnn(
        pdb_string=backbone_pdb,
        num_sequences=num_sequences,
        temperature=0.1  # Lower = more conservative
    )
    
    return sequences

3.2 Sampling Parameters

Parameter	Conservative	Moderate	Diverse
Temperature	0.1	0.2	0.5
Sequences per backbone	4	8	16
Use case	Validated scaffold	Exploration	Diversity

3.3 Output for Report

markdown

## 3. Sequence Design

### 3.1 Design Parameters

| Parameter | Value |
|-----------|-------|
| **Method** | ProteinMPNN via NVIDIA NIM |
| **Temperature** | 0.1 (conservative) |
| **Sequences per backbone** | 8 |
| **Total sequences** | 32 |

### 3.2 Designed Sequences (Top 10 by Score)

| Rank | Backbone | Sequence ID | Length | MPNN Score | Predicted pI |
|------|----------|-------------|--------|------------|--------------|
| 1 | BB_001 | Seq_001_A | 85 | -1.89 | 6.2 |
| 2 | BB_002 | Seq_002_C | 92 | -1.95 | 5.8 |
| 3 | BB_001 | Seq_001_B | 85 | -2.01 | 7.1 |
| 4 | BB_003 | Seq_003_A | 78 | -2.08 | 6.5 |
| 5 | BB_005 | Seq_005_B | 95 | -2.12 | 5.4 |

### 3.3 Top Sequence: Seq_001_A

Seq_001_A (85 aa, MPNN score: -1.89) MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL


*Source: NVIDIA NIM via `NvidiaNIM_proteinmpnn`*

Phase 4: Structure Validation

4.1 ESMFold Validation

python

def validate_structure(tu, sequence):
    """Validate designed sequence by structure prediction."""
    
    # Fast validation with ESMFold
    predicted = tu.tools.NvidiaNIM_esmfold(sequence=sequence)
    
    # Extract quality metrics
    plddt = extract_plddt(predicted)
    ptm = extract_ptm(predicted)
    
    return {
        'structure': predicted,
        'mean_plddt': np.mean(plddt),
        'ptm': ptm,
        'passes': np.mean(plddt) > 70 and ptm > 0.7
    }

4.2 Validation Criteria

Metric	Threshold	Interpretation
Mean pLDDT	>70	Confident fold
pTM	>0.7	Good global topology
RMSD to backbone	<2 Å	Design recapitulated

4.3 Output for Report

markdown

## 4. Structure Validation

### 4.1 Validation Results

| Sequence | pLDDT | pTM | RMSD to Design | Status |
|----------|-------|-----|----------------|--------|
| Seq_001_A | 88.5 | 0.85 | 1.2 Å | ✓ PASS |
| Seq_002_C | 82.3 | 0.79 | 1.5 Å | ✓ PASS |
| Seq_001_B | 85.1 | 0.82 | 1.3 Å | ✓ PASS |
| Seq_003_A | 79.8 | 0.76 | 1.8 Å | ✓ PASS |
| Seq_005_B | 68.2 | 0.65 | 2.8 Å | ✗ FAIL |

### 4.2 Top Validated Design: Seq_001_A

| Region | Residues | pLDDT | Interpretation |
|--------|----------|-------|----------------|
| Helix 1 | 1-28 | 92.3 | Very high confidence |
| Loop 1 | 29-35 | 78.4 | Moderate confidence |
| Helix 2 | 36-58 | 91.8 | Very high confidence |
| Loop 2 | 59-65 | 75.2 | Moderate confidence |
| Helix 3 | 66-85 | 90.1 | Very high confidence |

**Overall**: Well-folded 3-helix bundle with high confidence core

*Source: NVIDIA NIM via `NvidiaNIM_esmfold`*

Phase 5: Developability Assessment

5.1 Aggregation Propensity

python

def assess_aggregation(sequence):
    """Assess aggregation propensity."""
    
    # Calculate hydrophobic patches
    # Calculate isoelectric point
    # Identify aggregation-prone motifs
    
    return {
        'aggregation_score': score,
        'hydrophobic_patches': patches,
        'risk_level': 'Low' if score < 0.5 else 'Medium' if score < 0.7 else 'High'
    }

5.2 Developability Metrics

Metric	Favorable	Marginal	Unfavorable
Aggregation score	<0.5	0.5-0.7	>0.7
Isoelectric point	5-9	4-5 or 9-10	<4 or >10
Hydrophobic patches	<3	3-5	>5
Cysteine count	0 or even	Odd	Multiple unpaired

5.3 Output for Report

markdown

## 5. Developability Assessment

### 5.1 Developability Scores

| Design | Aggregation | pI | Cysteines | Expression | Overall |
|--------|-------------|-----|-----------|------------|---------|
| Seq_001_A | 0.32 (Low) | 6.2 | 0 | High | ★★★ |
| Seq_002_C | 0.45 (Low) | 5.8 | 2 (paired) | Medium | ★★☆ |
| Seq_001_B | 0.38 (Low) | 7.1 | 0 | High | ★★★ |
| Seq_003_A | 0.58 (Med) | 6.5 | 0 | Medium | ★★☆ |

### 5.2 Recommendations

**Best candidate for expression**: Seq_001_A
- Low aggregation propensity
- Neutral pI (easy purification)
- No cysteines (no misfolding risk)
- Predicted high E. coli expression

*Source: Sequence analysis*

Report Template

markdown

# Therapeutic Protein Design Report: [TARGET]

**Generated**: [Date] | **Query**: [Original query] | **Status**: In Progress

---

## Executive Summary
[Designing...]

---

## 1. Target Characterization
### 1.1 Target Information
[Designing...]
### 1.2 Binding Epitope
[Designing...]

---

## 2. Backbone Generation
### 2.1 Design Parameters
[Designing...]
### 2.2 Generated Backbones
[Designing...]

---

## 3. Sequence Design
### 3.1 ProteinMPNN Results
[Designing...]
### 3.2 Top Sequences
[Designing...]

---

## 4. Structure Validation
### 4.1 ESMFold Validation
[Designing...]
### 4.2 Quality Metrics
[Designing...]

---

## 5. Developability Assessment
### 5.1 Scores
[Designing...]
### 5.2 Recommendations
[Designing...]

---

## 6. Final Candidates
### 6.1 Ranked List
[Designing...]
### 6.2 Sequences for Testing
[Designing...]

---

## 7. Experimental Recommendations
[Designing...]

---

## 8. Data Sources
[Will be populated...]

Evidence Grading

Tier	Symbol	Criteria
T1	★★★	pLDDT >85, pTM >0.8, low aggregation, neutral pI
T2	★★☆	pLDDT >75, pTM >0.7, acceptable developability
T3	★☆☆	pLDDT >70, pTM >0.65, developability concerns
T4	☆☆☆	Failed validation or major developability issues

Completeness Checklist

Phase 1: Target

Target structure obtained (PDB or predicted)
Binding epitope identified
Existing binders noted

Phase 2: Backbones

≥5 backbones generated
Top 3-5 selected for sequence design
Selection criteria documented

Phase 3: Sequences

≥8 sequences per backbone designed
MPNN scores reported
Top 10 sequences listed

Phase 4: Validation

All sequences validated by ESMFold
pLDDT and pTM reported
Pass/fail criteria applied
≥3 passing designs

Phase 5: Developability

Aggregation assessed
pI calculated
Expression prediction
Final ranking

Phase 6: Deliverables

Ranked candidate list
FASTA file with sequences
Experimental recommendations

Fallback Chains

Primary Tool	Fallback 1	Fallback 2
`NvidiaNIM_rfdiffusion`	Manual backbone design	Scaffold from PDB
`NvidiaNIM_proteinmpnn`	Rosetta ProteinMPNN	Manual sequence design
`NvidiaNIM_esmfold`	`NvidiaNIM_alphafold2`	AlphaFold DB
PDB structure	`NvidiaNIM_alphafold2`	AlphaFold DB

Tool Reference

See TOOLS_REFERENCE.md for complete tool documentation.

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/tooluniverse-protein-therapeutic-design
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Therapeutic Protein Designer

When to Use

Critical Workflow Requirements

1. Report-First Approach (MANDATORY)

2. Design Documentation (MANDATORY)

Phase 0: Tool Verification

NVIDIA NIM Tools Required

Parameter Verification

Workflow Overview

Phase 1: Target Characterization

1.1 Get Target Structure

1.1b EMDB for Membrane Proteins (NEW)

1.2 Identify Binding Epitope

1.3 Output for Report

Phase 2: Backbone Generation

2.1 RFdiffusion Design

2.2 Design Modes

2.3 Output for Report

Phase 3: Sequence Design

3.1 ProteinMPNN Design

3.2 Sampling Parameters

3.3 Output for Report

Phase 4: Structure Validation

4.1 ESMFold Validation

4.2 Validation Criteria

4.3 Output for Report

Phase 5: Developability Assessment

5.1 Aggregation Propensity

5.2 Developability Metrics

5.3 Output for Report

Report Template

Evidence Grading

Completeness Checklist

Phase 1: Target

Phase 2: Backbones

Phase 3: Sequences

Phase 4: Validation

Phase 5: Developability

Phase 6: Deliverables

Fallback Chains

Tool Reference

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations