Agent skill
bio-workflows-crispr-editing-pipeline
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-workflows-crispr-editing-pipeline
SKILL.md
name: bio-workflows-crispr-editing-pipeline description: End-to-end CRISPR experiment design from target selection to delivery-ready constructs. Covers guide RNA design, off-target assessment, and specialized editing strategies including knockouts, base editing, and HDR knockins. Use when designing complete CRISPR editing experiments for gene knockout, correction, or tagging. tool_type: mixed primary_tool: crisprscan workflow: true depends_on:
- genome-engineering/grna-design
- genome-engineering/off-target-prediction
- genome-engineering/base-editing-design
- genome-engineering/prime-editing-design
- genome-engineering/hdr-template-design qc_checkpoints:
- after_grna_design: "Activity score >0.6, no poly-T runs, GC 40-70%"
- after_offtarget: "Specificity score >0.7, no coding off-targets with <3 mismatches"
- after_template: "Homology arms verified, PAM disrupted in donor" measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
CRISPR Editing Pipeline
Complete workflow for CRISPR experiment design: from target gene to delivery-ready constructs with branching paths for different editing strategies.
Workflow Overview
Target Gene/Position
|
v
[1. Guide RNA Design] --> CRISPRscan / Rule Set 2 / DeepCRISPR
|
v
[2. Off-Target Assessment] --> Cas-OFFinder + CFD scoring
|
v
Decision Point: What type of edit?
|
+---+-------------------+--------------------+
| | |
v v v
[3a. Knockout] [3b. Base Editing] [3c. Knockin]
Standard Cas9 CBE/ABE design HDR template
Frameshift C>T or A>G with homology arms
| | |
v v v
Final Constructs with Validation Primers
Prerequisites
pip install crisprscan biopython pandas numpy matplotlib
conda install -c bioconda primer3-py cas-offinder
# Python packages for scoring
pip install crisprtools # if available
Primary Path: Gene Knockout
Step 1: Guide RNA Design
from Bio import SeqIO
from Bio.Seq import Seq
import pandas as pd
import re
def find_guides(sequence, pam='NGG'):
'''Find all potential gRNA target sites with NGG PAM.'''
guides = []
seq_str = str(sequence).upper()
# Forward strand: 20bp + NGG
for match in re.finditer(r'(?=([ATCG]{20}[ATCG]GG))', seq_str):
pos = match.start()
target = match.group(1)[:20]
pam_seq = match.group(1)[20:23]
guides.append({
'sequence': target,
'pam': pam_seq,
'position': pos,
'strand': '+',
'full_target': match.group(1)
})
# Reverse strand: CCN + 20bp
for match in re.finditer(r'(?=(CC[ATCG][ATCG]{20}))', seq_str):
pos = match.start()
full = match.group(1)
target = str(Seq(full[3:23]).reverse_complement())
pam_seq = str(Seq(full[0:3]).reverse_complement())
guides.append({
'sequence': target,
'pam': pam_seq,
'position': pos,
'strand': '-',
'full_target': full
})
return pd.DataFrame(guides)
def score_guide(guide_seq):
'''Score guide using Rule Set 2-like heuristics.'''
score = 0.5 # Base score
# GC content (optimal: 40-70%)
gc = (guide_seq.count('G') + guide_seq.count('C')) / len(guide_seq)
if 0.4 <= gc <= 0.7:
score += 0.2
elif gc < 0.3 or gc > 0.8:
score -= 0.2
# No poly-T (>4 T's is Pol III terminator)
if 'TTTT' in guide_seq:
score -= 0.3
# G at position 20 (adjacent to PAM) preferred
if guide_seq[-1] == 'G':
score += 0.1
# Avoid GG at positions 19-20
if guide_seq[-2:] == 'GG':
score -= 0.1
# Seed region (positions 12-20) GC
seed = guide_seq[11:20]
seed_gc = (seed.count('G') + seed.count('C')) / len(seed)
if 0.4 <= seed_gc <= 0.7:
score += 0.1
return min(1.0, max(0.0, score))
# Example: Design guides for BRCA1 exon
gene_seq = '''ATGGATTTATCTGCTCTTCGCGTTGAAGAAGTACAAAATGTCATTAATGCTATGCAGAAAATCTTAGAGT
GTCCCATCTGTCTGGAGTTGATCAAGGAACCTGTCTCCACAAAGTGTGACCACATATTTTGCAAATTTTG'''
guides = find_guides(gene_seq.replace('\n', ''))
guides['activity_score'] = guides['sequence'].apply(score_guide)
# Filter high-scoring guides
# Activity score >0.6 is standard threshold for reliable editing
good_guides = guides[guides['activity_score'] > 0.6].sort_values('activity_score', ascending=False)
print(f'Found {len(good_guides)} high-scoring guides')
print(good_guides[['sequence', 'position', 'strand', 'activity_score']].head(10))
Step 2: Off-Target Assessment
import subprocess
from pathlib import Path
def run_cas_offinder(guides_df, genome_fasta, output_dir, max_mismatches=4):
'''Run Cas-OFFinder for off-target detection.'''
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
# Write input file
input_file = output_dir / 'cas_offinder_input.txt'
with open(input_file, 'w') as f:
f.write(f'{genome_fasta}\n')
f.write('NNNNNNNNNNNNNNNNNNNNNGG\n') # 20bp + NGG pattern
for _, row in guides_df.iterrows():
f.write(f"{row['sequence']}NNN {max_mismatches}\n")
# Run Cas-OFFinder
output_file = output_dir / 'offtargets.txt'
subprocess.run([
'cas-offinder', str(input_file), 'C', str(output_file) # C for CPU
], check=True)
# Parse results
offtargets = pd.read_csv(output_file, sep='\t', header=None,
names=['pattern', 'chromosome', 'position', 'target',
'strand', 'mismatches'])
return offtargets
def calculate_specificity_score(guide_seq, offtargets_df):
'''Calculate CFD-based specificity score.'''
# Simplified: penalize based on mismatch count and position
guide_offtargets = offtargets_df[offtargets_df['pattern'].str.contains(guide_seq[:10])]
if len(guide_offtargets) == 0:
return 1.0
# Weight by mismatch count (more mismatches = lower penalty)
penalty = 0
for _, ot in guide_offtargets.iterrows():
mm = ot['mismatches']
if mm == 0: # Perfect match elsewhere (bad!)
penalty += 1.0
elif mm == 1:
penalty += 0.5
elif mm == 2:
penalty += 0.2
elif mm == 3:
penalty += 0.1
else:
penalty += 0.05
# Specificity score: higher is better
# Score >0.7 is generally acceptable
return max(0, 1 - penalty / 10)
# Filter by off-target profile
good_guides['specificity_score'] = good_guides['sequence'].apply(
lambda x: calculate_specificity_score(x, pd.DataFrame()) # placeholder
)
# Combined score
good_guides['combined_score'] = (good_guides['activity_score'] * 0.5 +
good_guides['specificity_score'] * 0.5)
final_guides = good_guides.sort_values('combined_score', ascending=False).head(5)
Step 3a: Knockout Design (Frameshift)
def design_knockout(guide_row, target_sequence):
'''Design knockout experiment with validation primers.'''
guide_seq = guide_row['sequence']
position = guide_row['position']
# Cas9 cuts 3bp upstream of PAM
cut_site = position + 17 if guide_row['strand'] == '+' else position + 6
# Validation primers flanking cut site (~200bp amplicon)
# 200bp amplicon is optimal for detecting indels by gel or Sanger
left_start = max(0, cut_site - 100)
right_end = min(len(target_sequence), cut_site + 100)
return {
'guide_sequence': guide_seq,
'pam': guide_row['pam'],
'cut_site': cut_site,
'expected_outcome': 'Frameshift indel',
'validation_amplicon_start': left_start,
'validation_amplicon_end': right_end
}
ko_design = design_knockout(final_guides.iloc[0], gene_seq.replace('\n', ''))
print('Knockout Design:')
for k, v in ko_design.items():
print(f' {k}: {v}')
Step 3b: Base Editing Design (CBE/ABE)
def design_base_edit(target_position, target_sequence, edit_type='CBE'):
'''Design base editing experiment.
CBE: C>T conversion (or G>A on opposite strand)
ABE: A>G conversion (or T>C on opposite strand)
Editing window: positions 4-8 in the protospacer (counting from PAM-distal)
'''
guides = find_guides(target_sequence)
suitable_guides = []
for _, guide in guides.iterrows():
guide_start = guide['position']
guide_end = guide_start + 20
# Check if target position falls in editing window (positions 4-8)
# Window position 4-8 is optimal for BE3/BE4 (CBE) and ABE7.10/ABE8
if guide['strand'] == '+':
window_start = guide_start + 3 # Position 4
window_end = guide_start + 8 # Position 8
else:
window_start = guide_end - 8
window_end = guide_end - 3
if window_start <= target_position <= window_end:
# Check if target base is appropriate
target_base = target_sequence[target_position].upper()
if edit_type == 'CBE' and target_base in ['C', 'G']:
suitable_guides.append(guide)
elif edit_type == 'ABE' and target_base in ['A', 'T']:
suitable_guides.append(guide)
return pd.DataFrame(suitable_guides)
# Example: Design CBE to introduce stop codon
# C>T at specific position can create TAG/TAA/TGA stop
target_pos = 45 # Example position with C
cbe_guides = design_base_edit(target_pos, gene_seq.replace('\n', ''), 'CBE')
print(f'Found {len(cbe_guides)} CBE-compatible guides')
Step 3c: Knockin Design (HDR Template)
def design_hdr_template(guide_row, target_sequence, insert_sequence,
homology_arm_length=800):
'''Design HDR donor template with homology arms.
Homology arm length: 800bp is standard for plasmid donors.
For ssODN, use 30-60bp arms.
'''
cut_site = guide_row['position'] + 17 if guide_row['strand'] == '+' else guide_row['position'] + 6
# Extract homology arms
# Arms flank the cut site
left_arm_start = max(0, cut_site - homology_arm_length)
left_arm = target_sequence[left_arm_start:cut_site]
right_arm_end = min(len(target_sequence), cut_site + homology_arm_length)
right_arm = target_sequence[cut_site:right_arm_end]
# Mutate PAM in donor to prevent re-cutting
# Change NGG to NGA or NAG (silent if possible)
guide_seq = guide_row['sequence']
pam_position_in_arms = cut_site - left_arm_start + 3
# Full donor: left_arm + insert + right_arm
donor = left_arm + insert_sequence + right_arm
return {
'guide_sequence': guide_seq,
'cut_site': cut_site,
'left_arm': left_arm,
'right_arm': right_arm,
'insert': insert_sequence,
'donor_template': donor,
'donor_length': len(donor),
'note': 'Remember to mutate PAM in donor to prevent re-cutting'
}
# Example: Insert GFP tag
gfp_sequence = 'ATGGTGAGCAAGGGCGAGGAG...' # Truncated for example
hdr_design = design_hdr_template(final_guides.iloc[0], gene_seq.replace('\n', ''), 'FLAG_TAG', 50)
print('HDR Design:')
print(f" Left arm length: {len(hdr_design['left_arm'])}")
print(f" Right arm length: {len(hdr_design['right_arm'])}")
print(f" Total donor length: {hdr_design['donor_length']}")
Visualization
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
def plot_guide_landscape(guides_df, gene_length, exon_coords=None):
'''Visualize guide positions and scores along gene.'''
fig, axes = plt.subplots(2, 1, figsize=(14, 6), gridspec_kw={'height_ratios': [1, 2]})
# Top: Gene structure
ax1 = axes[0]
ax1.axhline(y=0.5, color='gray', linewidth=10, solid_capstyle='butt')
if exon_coords:
for start, end in exon_coords:
ax1.axhline(y=0.5, xmin=start/gene_length, xmax=end/gene_length,
color='steelblue', linewidth=20, solid_capstyle='butt')
ax1.set_xlim(0, gene_length)
ax1.set_ylim(0, 1)
ax1.set_ylabel('Gene')
ax1.set_xticks([])
ax1.set_yticks([])
# Bottom: Guide scores
ax2 = axes[1]
colors = ['green' if s > 0.6 else 'orange' if s > 0.4 else 'red'
for s in guides_df['activity_score']]
ax2.scatter(guides_df['position'], guides_df['activity_score'],
c=colors, s=50, alpha=0.7)
ax2.axhline(y=0.6, color='green', linestyle='--', alpha=0.5, label='Threshold')
ax2.set_xlim(0, gene_length)
ax2.set_ylim(0, 1)
ax2.set_xlabel('Position (bp)')
ax2.set_ylabel('Activity Score')
ax2.legend()
plt.tight_layout()
plt.savefig('guide_landscape.pdf')
return fig
# Plot
plot_guide_landscape(guides, len(gene_seq.replace('\n', '')),
exon_coords=[(0, 50), (70, 130)])
Parameter Recommendations
| Step | Parameter | Value | Rationale |
|---|---|---|---|
| Guide design | Activity score | >0.6 | Standard threshold for reliable editing |
| Guide design | GC content | 40-70% | Optimal for binding and Cas9 activity |
| Off-target | Max mismatches | 4 | Catches most relevant off-targets |
| Off-target | Specificity score | >0.7 | Acceptable off-target profile |
| Base editing | Window | positions 4-8 | Optimal for BE3/BE4, ABE7.10 |
| HDR | Homology arms | 800bp | Standard for plasmid donors |
| HDR (ssODN) | Homology arms | 30-60bp | For single-strand oligo donors |
Troubleshooting
| Issue | Likely Cause | Solution |
|---|---|---|
| No high-scoring guides | GC-poor region | Expand search region, consider Cas12a |
| Many off-targets | Repetitive sequence | Use high-fidelity Cas9 (eSpCas9, HiFi) |
| Low HDR efficiency | NHEJ dominant | Add NHEJ inhibitors, use ssODN |
| Base editing outside window | Guide position | Redesign with target in positions 4-8 |
| Bystander edits | Multiple C/A in window | Design guides with single target base |
Output Files
| File | Description |
|---|---|
guides_ranked.tsv |
All guides with activity and specificity scores |
offtargets.txt |
Cas-OFFinder results |
knockout_design.json |
KO guide and validation primers |
base_edit_design.json |
CBE/ABE design with editing window |
hdr_template.fasta |
Donor template sequence |
guide_landscape.pdf |
Visualization of guide positions |
Related Skills
- genome-engineering/grna-design - Detailed scoring algorithms
- genome-engineering/off-target-prediction - Cas-OFFinder and CFD
- genome-engineering/base-editing-design - CBE/ABE specifics
- genome-engineering/prime-editing-design - pegRNA design
- genome-engineering/hdr-template-design - Donor optimization
- primer-design/primer-basics - Validation primer design
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?