Agent skill
semantic-search
Use this when deciding between semantic search and grep/glob for code discovery. Apply for concept-based queries (find payment processing), intent-based searches (how is auth implemented), or when user doesn't know exact class names. Use grep for exact matches like specific function names
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/semantic-search
SKILL.md
Semantic Search Technique
Purpose
Decision framework and execution guide for using semantic search effectively in CodeCompass.
When to Use Semantic Search
✅ Use Semantic Search When
1. Concept-based Queries
- "Find code that handles payment processing"
- "Where do we validate email addresses?"
- "Show me error handling patterns"
2. Intent-based Queries
- "How is user authentication implemented?"
- "What code calculates shipping costs?"
- "Find business rules for order approval"
3. Cross-language/Cross-file
- Searching across PHP, TypeScript, config files
- Pattern discovery across multiple modules
- Finding similar implementations
4. Fuzzy/Exploratory
- User doesn't know exact class/function names
- Exploring unfamiliar codebase
- "Code that does something like X"
5. Natural Language
- "Show me all database migrations"
- "Find controllers that handle file uploads"
- "Where are API rate limits defined?"
❌ Use Grep/Glob When
1. Exact Matches
- "Find class named
PaymentController" - "Where is
processPaymentfunction defined?" - "Find all imports of
UserService"
2. Syntax Patterns
- "Find all functions starting with
get" - "Show me all
@Injectable()decorators" - "Find TypeScript interfaces"
3. Performance Critical
- Quick lookups in known files
- Repeated searches in tight loops
- When you know exact location
4. Structural Queries
- "Find all
.tsfiles insrc/modules" - "List all test files"
- "Show directory structure"
Execution Guide
Step 1: Formulate Effective Query
❌ Bad Queries (too vague):
- "payment"
- "code"
- "function"
✅ Good Queries (specific context):
- "business logic for processing customer payments and updating order status"
- "validation rules for user email and password requirements"
- "error handling patterns for database connection failures"
Why: More context = better semantic matching
Formula:
[Action/Purpose] for [Specific Entity] with [Context/Constraints]
Examples:
- "Extract business capabilities from Yii2 controllers"
- "Validation logic for user registration with email verification"
- "Database migration patterns for schema versioning"
Step 2: Verify Indexing
Before searching, ensure codebase is indexed:
# Check if indexed
curl http://localhost:8081/v1/schema
# Should show collections like:
# - CodeContext
# - AtlasCode
If not indexed:
codecompass batch:index <path-to-codebase>
Step 3: Execute Search
codecompass search:semantic "business logic for payment processing"
Alternative (if using as library):
const results = await searchService.semanticSearch({
query: "business logic for payment processing",
limit: 10,
certainty: 0.7 // Minimum relevance score
});
Step 4: Interpret Results
Check relevance scores:
- >0.8: Highly relevant (exact match)
- 0.7-0.8: Good match (related)
- 0.6-0.7: Moderate match (possibly relevant)
- <0.6: Weak match (may be noise)
Verify context:
- Does the returned code actually match intent?
- Are results from expected modules?
- Multiple related files found (good signal)
- Or isolated random matches (refine query)
Step 5: Refine if Needed
Too many results (>50):
- Add more specific context to query
- Increase certainty threshold
- Add domain constraints ("in authentication module")
Too few results (<3):
- Broaden query (less specific)
- Lower certainty threshold
- Check if area is actually indexed
- Try related terms/synonyms
Wrong results:
- Rephrase query with different terminology
- Add negative constraints
- Try breaking into multiple specific queries
Behind the Scenes
Architecture
Query Text
↓
Ollama Embedding (mxbai-embed-large)
↓
1024-dimensional vector
↓
Weaviate Vector Search (cosine similarity)
↓
Ranked Results
Key Components
From .ai/capabilities.json:
- Module:
search,vectorizer,weaviate - Embedding: Ollama mxbai-embed-large (1024 dimensions)
- Vector DB: Weaviate with HNSW indexing
- Collections:
CodeContext,AtlasCode
Configuration (from .env):
EMBEDDING_SERVICE=ollama
OLLAMA_EMBEDDING_MODEL=mxbai-embed-large
OLLAMA_URL=http://localhost:11434
CODECOMPASS_WEAVIATE_URL=http://localhost:8081
Advanced Patterns
Pattern 1: Multi-Query Exploration
For complex questions, break into multiple searches:
# Instead of:
"authentication and authorization and session management"
# Do:
codecompass search:semantic "user authentication login process"
codecompass search:semantic "authorization and access control"
codecompass search:semantic "session management and tokens"
Pattern 2: Iterative Refinement
# 1. Broad search
codecompass search:semantic "payment processing"
# 2. Review results, identify specific module
# 3. Narrow search
codecompass search:semantic "payment gateway integration in PaymentController"
# 4. Pinpoint implementation
codecompass search:semantic "Stripe API call for processing credit cards"
Pattern 3: Cross-Domain Search
Search across different aspects:
# Code implementation
codecompass search:semantic "email validation logic"
# Tests
codecompass search:semantic "test cases for email validation"
# Configuration
codecompass search:semantic "email service configuration"
Common Pitfalls
❌ Pitfall 1: Searching Before Indexing
Symptom: No results or error
Solution: Run codecompass batch:index first
❌ Pitfall 2: Too Vague Queries
Symptom: Returns everything or nothing useful Solution: Add specific context and intent
❌ Pitfall 3: Expecting Exact Matches
Symptom: "Why didn't it find function processPayment?"
Reason: Semantic search is for concepts, not exact names
Solution: Use grep for exact matches
❌ Pitfall 4: Ignoring Relevance Scores
Symptom: Reading irrelevant results Solution: Filter by score >0.7, ignore weak matches
❌ Pitfall 5: Single Query for Complex Questions
Symptom: Poor results for multi-faceted questions Solution: Break into multiple targeted queries
Decision Tree
┌─────────────────────────────────────┐
│ I need to find code that... │
└─────────────────────────────────────┘
↓
┌─────────┐
│ Know │ Exact class/function name?
│ exact │
│ name? │
└─────────┘
↙ ↘
YES NO
↓ ↓
Use Grep ┌─────────┐
│ Concept │ Searching by meaning/purpose?
│ search? │
└─────────┘
↙ ↘
YES NO
↓ ↓
Semantic ┌─────────┐
Search │ Pattern │ Looking for code pattern?
│ match? │
└─────────┘
↙ ↘
YES NO
↓ ↓
Use Glob Use both
(Glob + Semantic)
Performance Considerations
Speed
- Grep: Milliseconds (fast, synchronous)
- Semantic Search: 100-500ms (embedding + vector search)
Tradeoff: Semantic is slower but finds conceptually related code
Token Cost (Embeddings)
- Each query → 1 embedding generation
- Ollama local → No API cost
- But consumes local compute
Scaling
- Small codebase (<1K files): Either method fine
- Medium codebase (1K-10K files): Semantic search advantage grows
- Large codebase (>10K files): Semantic search essential
Integration with Other Tools
With Yii2 Analysis
# 1. Analyze Yii2 project
codecompass analyze:yii2 <path>
# 2. Index results
codecompass batch:index <path>
# 3. Explore with semantic search
codecompass search:semantic "Yii2 controller actions for user management"
With Requirements Extraction
# 1. Extract requirements
codecompass requirements:extract
# 2. Search extracted requirements
codecompass search:semantic "business rules for order validation"
With Weaviate Direct Query
# Alternative: Query Weaviate GraphQL API directly
curl -X POST http://localhost:8081/v1/graphql \
-H "Content-Type: application/json" \
-d '{
"query": "{
Get {
CodeContext(
nearText: { concepts: [\"payment processing\"] }
limit: 10
) {
content
filePath
}
}
}"
}'
Related Skills
0-discover-capabilities.md- How to discover modulesanalyze-yii2-project.md- Uses semantic search in workflow
Related Modules
From .ai/capabilities.json:
search- SearchService, IntegratedSearchServicevectorizer- Ollama embedding generationweaviate- Vector database clientindexing- File indexing pipeline
Remember: Semantic search finds code by meaning, not by name. Choose the right tool for the job.
Didn't find tool you were looking for?