Agent skill
bio-sequence-slicing
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-sequence-slicing
SKILL.md
name: bio-sequence-slicing description: Slice, extract, and concatenate biological sequences using Biopython. Use when extracting subsequences, joining sequences, or manipulating sequence regions by position. tool_type: python primary_tool: Bio.Seq measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Sequence Slicing
Extract, slice, and concatenate sequences using Biopython's Seq objects.
Required Import
from Bio.Seq import Seq
Core Operations
Indexing (Single Position)
seq = Seq('ATGCGATCG')
seq[0] # 'A' - first base (0-indexed)
seq[-1] # 'G' - last base
seq[3] # 'C' - fourth base
Slicing (Extract Region)
seq = Seq('ATGCGATCGATCG')
seq[0:3] # Seq('ATG') - first 3 bases
seq[3:6] # Seq('CGA') - positions 3-5
seq[:5] # Seq('ATGCG') - first 5
seq[-5:] # Seq('GATCG') - last 5
seq[::2] # Seq('AGGTGTG') - every 2nd base
seq[::-1] # Seq('GCTAGCTAGCGTA') - reversed
Note: Slicing returns a Seq object, not a string.
Concatenation
seq1 = Seq('ATGC')
seq2 = Seq('GGGG')
combined = seq1 + seq2 # Seq('ATGCGGGG')
Can also concatenate with strings:
seq = Seq('ATGC')
extended = seq + 'NNNN' # Seq('ATGCNNNN')
Code Patterns
Extract CDS by Coordinates
genome = Seq('NNNNATGCGATCGATCGTAANNN')
cds_start, cds_end = 4, 21
cds = genome[cds_start:cds_end]
Extract with 1-Based Coordinates
Biology often uses 1-based coordinates. Convert to 0-based:
def extract_1based(seq, start, end):
'''Extract using 1-based inclusive coordinates'''
return seq[start - 1:end]
genome = Seq('ATGCGATCGATCG')
region = extract_1based(genome, 1, 3) # Seq('ATG')
Split Sequence into Codons
def split_codons(seq):
return [seq[i:i+3] for i in range(0, len(seq) - len(seq) % 3, 3)]
seq = Seq('ATGCGATCGATCG')
codons = split_codons(seq) # [Seq('ATG'), Seq('CGA'), ...]
Split into Fixed-Length Chunks
def chunk_sequence(seq, size):
return [seq[i:i+size] for i in range(0, len(seq), size)]
seq = Seq('ATGCGATCGATCGATCGATCG')
chunks = chunk_sequence(seq, 10)
Join Sequences with Linker
seqs = [Seq('ATGC'), Seq('GGGG'), Seq('TTTT')]
linker = Seq('NNN')
joined = linker.join(seqs) # Seq('ATGCNNNGGGGNNTTTT')
Or manually:
linker = 'NNN'
joined = Seq(linker.join(str(s) for s in seqs))
Extract Multiple Regions
def extract_regions(seq, regions):
'''Extract and concatenate multiple regions'''
return sum((seq[start:end] for start, end in regions), Seq(''))
exon_coords = [(0, 50), (100, 150), (200, 250)]
mrna = extract_regions(genomic_seq, exon_coords)
Extract Flanking Regions
def get_flanking(seq, position, flank_size):
'''Get sequence around a position'''
start = max(0, position - flank_size)
end = min(len(seq), position + flank_size + 1)
return seq[start:end]
seq = Seq('ATGCGATCGATCGATCGATCG')
flanking = get_flanking(seq, 10, 5) # 5 bp on each side of position 10
Tile Sequence into Overlapping Windows
def sliding_windows(seq, window_size, step=1):
for i in range(0, len(seq) - window_size + 1, step):
yield seq[i:i + window_size]
seq = Seq('ATGCGATCGATCG')
for window in sliding_windows(seq, 5, 2):
print(window)
Extract Feature from SeqRecord
from Bio import SeqIO
for record in SeqIO.parse('sequence.gb', 'genbank'):
for feature in record.features:
if feature.type == 'CDS':
cds_seq = feature.extract(record.seq)
print(f'{feature.qualifiers.get("gene", ["?"])[0]}: {cds_seq[:30]}...')
Create New SeqRecord from Slice
from Bio.SeqRecord import SeqRecord
original = SeqRecord(Seq('ATGCGATCGATCGATCG'), id='full', description='Full sequence')
subset = SeqRecord(original.seq[5:15], id='subset', description=f'Positions 5-15 of {original.id}')
Coordinate Systems
| System | Position 1 | Example |
|---|---|---|
| 0-based (Python) | Index 0 | seq[0:3] gets positions 0, 1, 2 |
| 1-based (Biology) | Index 1 | Position 1-3 = seq[0:3] |
| 0-based half-open | Start inclusive, end exclusive | Standard Python slicing |
Common Errors
| Error | Cause | Solution |
|---|---|---|
IndexError |
Index out of range | Check sequence length first |
| Unexpected length | Off-by-one error | Remember end index is exclusive |
| Empty result | Start >= end | Check coordinate order |
| Wrong positions | 1-based vs 0-based confusion | Convert coordinates explicitly |
Decision Tree
Need to extract or combine sequences?
├── Single position?
│ └── Use indexing: seq[i]
├── Contiguous region?
│ └── Use slicing: seq[start:end]
├── Multiple non-contiguous regions?
│ └── Extract each, concatenate with +
├── Join sequences?
│ ├── No linker: seq1 + seq2
│ └── With linker: linker.join(seqs)
├── Split into parts?
│ └── List comprehension with slicing
└── From GenBank features?
└── Use feature.extract(record.seq)
Related Skills
- seq-objects - Create Seq and SeqRecord objects
- sequence-io/read-sequences - Parse GenBank files with features to extract
- transcription-translation - Translate extracted CDS regions
- alignment-files - Extract sequences from BAM using samtools fasta/fastq
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?