Agent skill

bio-workflows-crispr-editing-pipeline

Stars 2,009
Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-workflows-crispr-editing-pipeline

SKILL.md


name: bio-workflows-crispr-editing-pipeline description: End-to-end CRISPR experiment design from target selection to delivery-ready constructs. Covers guide RNA design, off-target assessment, and specialized editing strategies including knockouts, base editing, and HDR knockins. Use when designing complete CRISPR editing experiments for gene knockout, correction, or tagging. tool_type: mixed primary_tool: crisprscan workflow: true depends_on:

  • genome-engineering/grna-design
  • genome-engineering/off-target-prediction
  • genome-engineering/base-editing-design
  • genome-engineering/prime-editing-design
  • genome-engineering/hdr-template-design qc_checkpoints:
  • after_grna_design: "Activity score >0.6, no poly-T runs, GC 40-70%"
  • after_offtarget: "Specificity score >0.7, no coding off-targets with <3 mismatches"
  • after_template: "Homology arms verified, PAM disrupted in donor" measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
  • read_file
  • run_shell_command

CRISPR Editing Pipeline

Complete workflow for CRISPR experiment design: from target gene to delivery-ready constructs with branching paths for different editing strategies.

Workflow Overview

Target Gene/Position
        |
        v
[1. Guide RNA Design] --> CRISPRscan / Rule Set 2 / DeepCRISPR
        |
        v
[2. Off-Target Assessment] --> Cas-OFFinder + CFD scoring
        |
        v
    Decision Point: What type of edit?
        |
    +---+-------------------+--------------------+
    |                       |                    |
    v                       v                    v
[3a. Knockout]        [3b. Base Editing]   [3c. Knockin]
 Standard Cas9         CBE/ABE design       HDR template
 Frameshift            C>T or A>G           with homology arms
        |                   |                    |
        v                   v                    v
    Final Constructs with Validation Primers

Prerequisites

bash
pip install crisprscan biopython pandas numpy matplotlib

conda install -c bioconda primer3-py cas-offinder

# Python packages for scoring
pip install crisprtools  # if available

Primary Path: Gene Knockout

Step 1: Guide RNA Design

python
from Bio import SeqIO
from Bio.Seq import Seq
import pandas as pd
import re

def find_guides(sequence, pam='NGG'):
    '''Find all potential gRNA target sites with NGG PAM.'''
    guides = []
    seq_str = str(sequence).upper()

    # Forward strand: 20bp + NGG
    for match in re.finditer(r'(?=([ATCG]{20}[ATCG]GG))', seq_str):
        pos = match.start()
        target = match.group(1)[:20]
        pam_seq = match.group(1)[20:23]
        guides.append({
            'sequence': target,
            'pam': pam_seq,
            'position': pos,
            'strand': '+',
            'full_target': match.group(1)
        })

    # Reverse strand: CCN + 20bp
    for match in re.finditer(r'(?=(CC[ATCG][ATCG]{20}))', seq_str):
        pos = match.start()
        full = match.group(1)
        target = str(Seq(full[3:23]).reverse_complement())
        pam_seq = str(Seq(full[0:3]).reverse_complement())
        guides.append({
            'sequence': target,
            'pam': pam_seq,
            'position': pos,
            'strand': '-',
            'full_target': full
        })

    return pd.DataFrame(guides)


def score_guide(guide_seq):
    '''Score guide using Rule Set 2-like heuristics.'''
    score = 0.5  # Base score

    # GC content (optimal: 40-70%)
    gc = (guide_seq.count('G') + guide_seq.count('C')) / len(guide_seq)
    if 0.4 <= gc <= 0.7:
        score += 0.2
    elif gc < 0.3 or gc > 0.8:
        score -= 0.2

    # No poly-T (>4 T's is Pol III terminator)
    if 'TTTT' in guide_seq:
        score -= 0.3

    # G at position 20 (adjacent to PAM) preferred
    if guide_seq[-1] == 'G':
        score += 0.1

    # Avoid GG at positions 19-20
    if guide_seq[-2:] == 'GG':
        score -= 0.1

    # Seed region (positions 12-20) GC
    seed = guide_seq[11:20]
    seed_gc = (seed.count('G') + seed.count('C')) / len(seed)
    if 0.4 <= seed_gc <= 0.7:
        score += 0.1

    return min(1.0, max(0.0, score))


# Example: Design guides for BRCA1 exon
gene_seq = '''ATGGATTTATCTGCTCTTCGCGTTGAAGAAGTACAAAATGTCATTAATGCTATGCAGAAAATCTTAGAGT
GTCCCATCTGTCTGGAGTTGATCAAGGAACCTGTCTCCACAAAGTGTGACCACATATTTTGCAAATTTTG'''

guides = find_guides(gene_seq.replace('\n', ''))
guides['activity_score'] = guides['sequence'].apply(score_guide)

# Filter high-scoring guides
# Activity score >0.6 is standard threshold for reliable editing
good_guides = guides[guides['activity_score'] > 0.6].sort_values('activity_score', ascending=False)
print(f'Found {len(good_guides)} high-scoring guides')
print(good_guides[['sequence', 'position', 'strand', 'activity_score']].head(10))

Step 2: Off-Target Assessment

python
import subprocess
from pathlib import Path

def run_cas_offinder(guides_df, genome_fasta, output_dir, max_mismatches=4):
    '''Run Cas-OFFinder for off-target detection.'''
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    # Write input file
    input_file = output_dir / 'cas_offinder_input.txt'
    with open(input_file, 'w') as f:
        f.write(f'{genome_fasta}\n')
        f.write('NNNNNNNNNNNNNNNNNNNNNGG\n')  # 20bp + NGG pattern
        for _, row in guides_df.iterrows():
            f.write(f"{row['sequence']}NNN {max_mismatches}\n")

    # Run Cas-OFFinder
    output_file = output_dir / 'offtargets.txt'
    subprocess.run([
        'cas-offinder', str(input_file), 'C', str(output_file)  # C for CPU
    ], check=True)

    # Parse results
    offtargets = pd.read_csv(output_file, sep='\t', header=None,
                              names=['pattern', 'chromosome', 'position', 'target',
                                    'strand', 'mismatches'])
    return offtargets


def calculate_specificity_score(guide_seq, offtargets_df):
    '''Calculate CFD-based specificity score.'''
    # Simplified: penalize based on mismatch count and position
    guide_offtargets = offtargets_df[offtargets_df['pattern'].str.contains(guide_seq[:10])]

    if len(guide_offtargets) == 0:
        return 1.0

    # Weight by mismatch count (more mismatches = lower penalty)
    penalty = 0
    for _, ot in guide_offtargets.iterrows():
        mm = ot['mismatches']
        if mm == 0:  # Perfect match elsewhere (bad!)
            penalty += 1.0
        elif mm == 1:
            penalty += 0.5
        elif mm == 2:
            penalty += 0.2
        elif mm == 3:
            penalty += 0.1
        else:
            penalty += 0.05

    # Specificity score: higher is better
    # Score >0.7 is generally acceptable
    return max(0, 1 - penalty / 10)


# Filter by off-target profile
good_guides['specificity_score'] = good_guides['sequence'].apply(
    lambda x: calculate_specificity_score(x, pd.DataFrame())  # placeholder
)

# Combined score
good_guides['combined_score'] = (good_guides['activity_score'] * 0.5 +
                                  good_guides['specificity_score'] * 0.5)
final_guides = good_guides.sort_values('combined_score', ascending=False).head(5)

Step 3a: Knockout Design (Frameshift)

python
def design_knockout(guide_row, target_sequence):
    '''Design knockout experiment with validation primers.'''
    guide_seq = guide_row['sequence']
    position = guide_row['position']

    # Cas9 cuts 3bp upstream of PAM
    cut_site = position + 17 if guide_row['strand'] == '+' else position + 6

    # Validation primers flanking cut site (~200bp amplicon)
    # 200bp amplicon is optimal for detecting indels by gel or Sanger
    left_start = max(0, cut_site - 100)
    right_end = min(len(target_sequence), cut_site + 100)

    return {
        'guide_sequence': guide_seq,
        'pam': guide_row['pam'],
        'cut_site': cut_site,
        'expected_outcome': 'Frameshift indel',
        'validation_amplicon_start': left_start,
        'validation_amplicon_end': right_end
    }

ko_design = design_knockout(final_guides.iloc[0], gene_seq.replace('\n', ''))
print('Knockout Design:')
for k, v in ko_design.items():
    print(f'  {k}: {v}')

Step 3b: Base Editing Design (CBE/ABE)

python
def design_base_edit(target_position, target_sequence, edit_type='CBE'):
    '''Design base editing experiment.
    CBE: C>T conversion (or G>A on opposite strand)
    ABE: A>G conversion (or T>C on opposite strand)

    Editing window: positions 4-8 in the protospacer (counting from PAM-distal)
    '''
    guides = find_guides(target_sequence)

    suitable_guides = []
    for _, guide in guides.iterrows():
        guide_start = guide['position']
        guide_end = guide_start + 20

        # Check if target position falls in editing window (positions 4-8)
        # Window position 4-8 is optimal for BE3/BE4 (CBE) and ABE7.10/ABE8
        if guide['strand'] == '+':
            window_start = guide_start + 3  # Position 4
            window_end = guide_start + 8    # Position 8
        else:
            window_start = guide_end - 8
            window_end = guide_end - 3

        if window_start <= target_position <= window_end:
            # Check if target base is appropriate
            target_base = target_sequence[target_position].upper()
            if edit_type == 'CBE' and target_base in ['C', 'G']:
                suitable_guides.append(guide)
            elif edit_type == 'ABE' and target_base in ['A', 'T']:
                suitable_guides.append(guide)

    return pd.DataFrame(suitable_guides)


# Example: Design CBE to introduce stop codon
# C>T at specific position can create TAG/TAA/TGA stop
target_pos = 45  # Example position with C
cbe_guides = design_base_edit(target_pos, gene_seq.replace('\n', ''), 'CBE')
print(f'Found {len(cbe_guides)} CBE-compatible guides')

Step 3c: Knockin Design (HDR Template)

python
def design_hdr_template(guide_row, target_sequence, insert_sequence,
                         homology_arm_length=800):
    '''Design HDR donor template with homology arms.

    Homology arm length: 800bp is standard for plasmid donors.
    For ssODN, use 30-60bp arms.
    '''
    cut_site = guide_row['position'] + 17 if guide_row['strand'] == '+' else guide_row['position'] + 6

    # Extract homology arms
    # Arms flank the cut site
    left_arm_start = max(0, cut_site - homology_arm_length)
    left_arm = target_sequence[left_arm_start:cut_site]

    right_arm_end = min(len(target_sequence), cut_site + homology_arm_length)
    right_arm = target_sequence[cut_site:right_arm_end]

    # Mutate PAM in donor to prevent re-cutting
    # Change NGG to NGA or NAG (silent if possible)
    guide_seq = guide_row['sequence']
    pam_position_in_arms = cut_site - left_arm_start + 3

    # Full donor: left_arm + insert + right_arm
    donor = left_arm + insert_sequence + right_arm

    return {
        'guide_sequence': guide_seq,
        'cut_site': cut_site,
        'left_arm': left_arm,
        'right_arm': right_arm,
        'insert': insert_sequence,
        'donor_template': donor,
        'donor_length': len(donor),
        'note': 'Remember to mutate PAM in donor to prevent re-cutting'
    }


# Example: Insert GFP tag
gfp_sequence = 'ATGGTGAGCAAGGGCGAGGAG...'  # Truncated for example
hdr_design = design_hdr_template(final_guides.iloc[0], gene_seq.replace('\n', ''), 'FLAG_TAG', 50)
print('HDR Design:')
print(f"  Left arm length: {len(hdr_design['left_arm'])}")
print(f"  Right arm length: {len(hdr_design['right_arm'])}")
print(f"  Total donor length: {hdr_design['donor_length']}")

Visualization

python
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np

def plot_guide_landscape(guides_df, gene_length, exon_coords=None):
    '''Visualize guide positions and scores along gene.'''
    fig, axes = plt.subplots(2, 1, figsize=(14, 6), gridspec_kw={'height_ratios': [1, 2]})

    # Top: Gene structure
    ax1 = axes[0]
    ax1.axhline(y=0.5, color='gray', linewidth=10, solid_capstyle='butt')

    if exon_coords:
        for start, end in exon_coords:
            ax1.axhline(y=0.5, xmin=start/gene_length, xmax=end/gene_length,
                       color='steelblue', linewidth=20, solid_capstyle='butt')

    ax1.set_xlim(0, gene_length)
    ax1.set_ylim(0, 1)
    ax1.set_ylabel('Gene')
    ax1.set_xticks([])
    ax1.set_yticks([])

    # Bottom: Guide scores
    ax2 = axes[1]
    colors = ['green' if s > 0.6 else 'orange' if s > 0.4 else 'red'
              for s in guides_df['activity_score']]

    ax2.scatter(guides_df['position'], guides_df['activity_score'],
                c=colors, s=50, alpha=0.7)
    ax2.axhline(y=0.6, color='green', linestyle='--', alpha=0.5, label='Threshold')
    ax2.set_xlim(0, gene_length)
    ax2.set_ylim(0, 1)
    ax2.set_xlabel('Position (bp)')
    ax2.set_ylabel('Activity Score')
    ax2.legend()

    plt.tight_layout()
    plt.savefig('guide_landscape.pdf')
    return fig


# Plot
plot_guide_landscape(guides, len(gene_seq.replace('\n', '')),
                     exon_coords=[(0, 50), (70, 130)])

Parameter Recommendations

Step Parameter Value Rationale
Guide design Activity score >0.6 Standard threshold for reliable editing
Guide design GC content 40-70% Optimal for binding and Cas9 activity
Off-target Max mismatches 4 Catches most relevant off-targets
Off-target Specificity score >0.7 Acceptable off-target profile
Base editing Window positions 4-8 Optimal for BE3/BE4, ABE7.10
HDR Homology arms 800bp Standard for plasmid donors
HDR (ssODN) Homology arms 30-60bp For single-strand oligo donors

Troubleshooting

Issue Likely Cause Solution
No high-scoring guides GC-poor region Expand search region, consider Cas12a
Many off-targets Repetitive sequence Use high-fidelity Cas9 (eSpCas9, HiFi)
Low HDR efficiency NHEJ dominant Add NHEJ inhibitors, use ssODN
Base editing outside window Guide position Redesign with target in positions 4-8
Bystander edits Multiple C/A in window Design guides with single target base

Output Files

File Description
guides_ranked.tsv All guides with activity and specificity scores
offtargets.txt Cas-OFFinder results
knockout_design.json KO guide and validation primers
base_edit_design.json CBE/ABE design with editing window
hdr_template.fasta Donor template sequence
guide_landscape.pdf Visualization of guide positions

Related Skills

  • genome-engineering/grna-design - Detailed scoring algorithms
  • genome-engineering/off-target-prediction - Cas-OFFinder and CFD
  • genome-engineering/base-editing-design - CBE/ABE specifics
  • genome-engineering/prime-editing-design - pegRNA design
  • genome-engineering/hdr-template-design - Donor optimization
  • primer-design/primer-basics - Validation primer design

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275
Explore
FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results