Agent skill
bio-sequence-similarity
Install this agent skill to your Project
npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-sequence-similarity
SKILL.md
name: bio-sequence-similarity description: Find homologous sequences using iterative BLAST (PSI-BLAST), profile HMMs (HMMER), and reciprocal best hit analysis. Use when identifying orthologs, distant homologs, or protein family members where standard BLAST is not sensitive enough. tool_type: mixed primary_tool: BLAST+ measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Sequence Similarity Searches
Advanced methods for finding homologous sequences beyond standard BLAST.
PSI-BLAST (Position-Specific Iterated BLAST)
Builds a position-specific scoring matrix (PSSM) through iterations to find distant homologs.
Basic PSI-BLAST
psiblast -query protein.fasta -db nr -out results.txt -num_iterations 3
Save PSSM for Reuse
psiblast -query protein.fasta -db nr \
-out results.txt \
-out_pssm pssm.asn \
-out_ascii_pssm pssm.txt \
-num_iterations 5
Use Existing PSSM
psiblast -in_pssm pssm.asn -db nr -out results.txt
Output Format
psiblast -query protein.fasta -db nr \
-out results.txt \
-outfmt 6 \
-num_iterations 3 \
-inclusion_ethresh 0.001
Key Parameters
psiblast -query protein.fasta -db nr \
-num_iterations 5 \
-inclusion_ethresh 0.001 \
-evalue 0.01 \
-num_threads 8 \
-out results.txt
PSI-BLAST Parameters
| Parameter | Default | Description |
|---|---|---|
| -num_iterations | 1 | Number of iterations |
| -inclusion_ethresh | 0.002 | E-value for PSSM inclusion |
| -evalue | 10 | E-value threshold for reporting |
| -num_threads | 1 | CPU threads |
HMMER for Profile Searches
HMMER uses profile hidden Markov models for sensitive sequence searches.
Search with Single Sequence
jackhmmer -o results.txt -A aligned.sto --cpu 8 query.fasta database.fasta
Build Profile from Alignment
hmmbuild profile.hmm alignment.sto
Search Database with Profile
hmmsearch -o results.txt --tblout hits.tbl profile.hmm database.fasta
hmmsearch -o results.txt --domtblout domains.tbl profile.hmm database.fasta
Download Pfam Profiles
wget https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz
gunzip Pfam-A.hmm.gz
hmmpress Pfam-A.hmm
Scan Sequence Against Pfam
hmmscan --tblout pfam_hits.tbl --domtblout domains.tbl Pfam-A.hmm query.fasta
Parse HMMER Output
grep -v "^#" hits.tbl | head
awk '$5 < 1e-10' hits.tbl
HMMER Output Columns (--tblout)
| Column | Description |
|---|---|
| 1 | Target name |
| 2 | Accession |
| 3 | Query name |
| 4 | Query accession |
| 5 | E-value (full sequence) |
| 6 | Score (full sequence) |
| 7 | Bias |
| 8 | E-value (best domain) |
| 9 | Score (best domain) |
Reciprocal Best Hit (RBH) Analysis
Find orthologs using bidirectional best hits.
Create BLAST Databases
makeblastdb -in species_A.fasta -dbtype prot -out species_A_db
makeblastdb -in species_B.fasta -dbtype prot -out species_B_db
Bidirectional BLAST
blastp -query species_A.fasta -db species_B_db -outfmt 6 -evalue 1e-5 -max_target_seqs 1 > A_vs_B.txt
blastp -query species_B.fasta -db species_A_db -outfmt 6 -evalue 1e-5 -max_target_seqs 1 > B_vs_A.txt
Find Reciprocal Best Hits
awk 'FNR==NR {a[$1]=$2; next} $2 in a && a[$2]==$1 {print $1"\t"$2}' \
A_vs_B.txt B_vs_A.txt > reciprocal_best_hits.txt
Python RBH Script
def find_rbh(forward_blast, reverse_blast):
'''Find reciprocal best hits from BLAST results'''
forward = {}
with open(forward_blast) as f:
for line in f:
parts = line.strip().split('\t')
query, subject = parts[0], parts[1]
if query not in forward:
forward[query] = subject
reverse = {}
with open(reverse_blast) as f:
for line in f:
parts = line.strip().split('\t')
query, subject = parts[0], parts[1]
if query not in reverse:
reverse[query] = subject
rbh = []
for a, b in forward.items():
if b in reverse and reverse[b] == a:
rbh.append((a, b))
return rbh
rbh_pairs = find_rbh('A_vs_B.txt', 'B_vs_A.txt')
for a, b in rbh_pairs:
print(f'{a}\t{b}')
Delta-BLAST
Uses conserved domain database for more sensitive initial search.
deltablast -query protein.fasta -db nr -rpsdb cdd_delta -out results.txt
PHI-BLAST (Pattern-Hit Initiated)
Search with a pattern plus sequence.
phi_pattern="G-x(2)-[ST]-x-[RK]"
phiblast -query protein.fasta -db nr -pattern "$phi_pattern" -out results.txt
Iterative Search with Biopython
from Bio.Blast import NCBIWWW, NCBIXML
with open('query.fasta') as f:
query = f.read()
result_handle = NCBIWWW.qblast('psiblast', 'nr', query, expect=0.001, word_size=3)
with open('psiblast_result.xml', 'w') as out:
out.write(result_handle.read())
result_handle.close()
with open('psiblast_result.xml') as f:
records = NCBIXML.parse(f)
for record in records:
for alignment in record.alignments:
for hsp in alignment.hsps:
if hsp.expect < 1e-10:
print(f'{alignment.hit_def[:50]}: E={hsp.expect}')
HMMER with Biopython
from Bio import SearchIO
results = SearchIO.parse('hmmsearch_output.txt', 'hmmer3-text')
for query_result in results:
print(f'Query: {query_result.id}')
for hit in query_result:
print(f' Hit: {hit.id}, E-value: {hit.evalue}')
for hsp in hit:
print(f' Domain: {hsp.bitscore} bits')
Jackhmmer (Iterative HMMER)
Similar to PSI-BLAST but uses HMM profiles.
jackhmmer -N 5 -o results.txt --tblout hits.tbl query.fasta database.fasta
jackhmmer -N 5 -A iterations.sto --chkhmm checkpoint query.fasta database.fasta
OrthoFinder for Multi-Species Orthologs
orthofinder -f proteomes/ -t 8
orthofinder -f proteomes/ -t 8 -M msa
Prepare Input
mkdir proteomes
cp species_*.fasta proteomes/
Output Files
| File | Content |
|---|---|
| Orthogroups.tsv | All orthogroups |
| Orthogroups_SingleCopyOrthologues.txt | 1:1 orthologs |
| Species_Tree/ | Inferred species tree |
| Gene_Trees/ | Individual gene trees |
E-value vs Bit Score
| E-value | Interpretation |
|---|---|
| < 1e-50 | Highly significant, likely homolog |
| 1e-50 to 1e-10 | Significant, probable homolog |
| 1e-10 to 1e-3 | Marginal, possible remote homolog |
| > 0.01 | Not significant |
Complete Ortholog Finding Pipeline
#!/bin/bash
SPECIES_A=$1
SPECIES_B=$2
EVALUE=1e-10
THREADS=8
echo "Building databases..."
makeblastdb -in $SPECIES_A -dbtype prot -out db_A
makeblastdb -in $SPECIES_B -dbtype prot -out db_B
echo "Running forward BLAST..."
blastp -query $SPECIES_A -db db_B -outfmt 6 -evalue $EVALUE \
-max_target_seqs 1 -num_threads $THREADS > forward.txt
echo "Running reverse BLAST..."
blastp -query $SPECIES_B -db db_A -outfmt 6 -evalue $EVALUE \
-max_target_seqs 1 -num_threads $THREADS > reverse.txt
echo "Finding reciprocal best hits..."
awk 'FNR==NR {best[$1]=$2; next}
$2 in best && best[$2]==$1 {print $1"\t"$2}' \
forward.txt reverse.txt > orthologs.txt
echo "Found $(wc -l < orthologs.txt) ortholog pairs"
rm -f db_A.* db_B.*
Related Skills
- blast-searches - Basic remote BLAST
- local-blast - Local BLAST databases
- entrez-fetch - Download sequences
- alignment - Align identified homologs
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
chemist-analyst
Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.
bio-alignment-io
Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
sleep-analyzer
分析睡眠数据、识别睡眠模式、评估睡眠质量,并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
bio-hi-c-analysis-matrix-operations
Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.
Didn't find tool you were looking for?