Agent skill

rag-retrieval

Retrieval-Augmented Generation patterns for grounded LLM responses. Use when building RAG pipelines, embedding documents, implementing hybrid search, contextual retrieval, HyDE, agentic RAG, multimodal RAG, query decomposition, reranking, or pgvector search.

View SKILL.md on GitHub Repository

Stars 143

Forks 15

Install this agent skill to your Project

npx add-skill https://github.com/yonatangross/orchestkit/tree/main/plugins/ork/skills/rag-retrieval

Metadata

Additional technical details for this skill

category: mcp-enhancement

SKILL.md

RAG Retrieval

Comprehensive patterns for building production RAG systems. Each category has individual rule files in rules/ loaded on-demand.

Quick Reference

Category	Rules	Impact	When to Use
Core RAG	4	CRITICAL	Basic RAG, citations, hybrid search, context management
Embeddings	3	HIGH	Model selection, chunking, batch/cache optimization
Contextual Retrieval	3	HIGH	Context-prepending, hybrid BM25+vector, pipeline
HyDE	3	HIGH	Vocabulary mismatch, hypothetical document generation
Agentic RAG	4	HIGH	Self-RAG, CRAG, knowledge graphs, adaptive routing
Multimodal RAG	3	MEDIUM	Image+text retrieval, PDF chunking, cross-modal search
Query Decomposition	3	MEDIUM	Multi-concept queries, parallel retrieval, RRF fusion
Reranking	3	MEDIUM	Cross-encoder, LLM scoring, combined signals
PGVector	4	HIGH	PostgreSQL hybrid search, HNSW indexes, schema design

Total: 30 rules across 9 categories

Core RAG

Fundamental patterns for retrieval, generation, and pipeline composition.

Rule	File	Key Pattern
Basic RAG	`rules/core-basic-rag.md`	Retrieve + context + generate with citations
Hybrid Search	`rules/core-hybrid-search.md`	RRF fusion (k=60) for semantic + keyword
Context Management	`rules/core-context-management.md`	Token budgeting + sufficiency check
Pipeline Composition	`rules/core-pipeline-composition.md`	Composable Decompose → HyDE → Retrieve → Rerank

Embeddings

Embedding models, chunking strategies, and production optimization.

Rule	File	Key Pattern
Models & API	`rules/embeddings-models.md`	Model selection, batch API, similarity
Chunking	`rules/embeddings-chunking.md`	Semantic boundary splitting, 512 token sweet spot
Advanced	`rules/embeddings-advanced.md`	Redis cache, Matryoshka dims, batch processing

Contextual Retrieval

Anthropic's context-prepending technique — 67% fewer retrieval failures.

Rule	File	Key Pattern
Context Prepending	`rules/contextual-prepend.md`	LLM-generated context + prompt caching
Hybrid Search	`rules/contextual-hybrid.md`	40% BM25 / 60% vector weight split
Complete Pipeline	`rules/contextual-pipeline.md`	End-to-end indexing + hybrid retrieval

HyDE

Hypothetical Document Embeddings for bridging vocabulary gaps.

Rule	File	Key Pattern
Generation	`rules/hyde-generation.md`	Embed hypothetical doc, not query
Per-Concept	`rules/hyde-per-concept.md`	Parallel HyDE for multi-topic queries
Fallback	`rules/hyde-fallback.md`	2-3s timeout → direct embedding fallback

Agentic RAG

Self-correcting retrieval with LLM-driven decision making.

Rule	File	Key Pattern
Self-RAG	`rules/agentic-self-rag.md`	Binary document grading for relevance
Corrective RAG	`rules/agentic-corrective-rag.md`	CRAG workflow with web fallback
Knowledge Graph	`rules/agentic-knowledge-graph.md`	KG + vector hybrid for entity-rich domains
Adaptive Retrieval	`rules/agentic-adaptive-retrieval.md`	Query routing to optimal strategy

Multimodal RAG

Image + text retrieval with cross-modal search.

Rule	File	Key Pattern
Embeddings	`rules/multimodal-embeddings.md`	CLIP, SigLIP 2, Voyage multimodal-3
Chunking	`rules/multimodal-chunking.md`	PDF extraction preserving images
Pipeline	`rules/multimodal-pipeline.md`	Dedup + hybrid retrieval + generation

Query Decomposition

Breaking complex queries into concepts for parallel retrieval.

Rule	File	Key Pattern
Detection	`rules/query-detection.md`	Heuristic indicators (<1ms fast path)
Decompose + RRF	`rules/query-decompose.md`	LLM concept extraction + parallel retrieval
HyDE Combo	`rules/query-hyde-combo.md`	Decompose + HyDE for maximum coverage

Reranking

Post-retrieval re-scoring for higher precision.

Rule	File	Key Pattern
Cross-Encoder	`rules/reranking-cross-encoder.md`	ms-marco-MiniLM (~50ms, free)
LLM Reranking	`rules/reranking-llm.md`	Batch scoring + Cohere API
Combined	`rules/reranking-combined.md`	Multi-signal weighted scoring

PGVector

Production hybrid search with PostgreSQL.

Rule	File	Key Pattern
Schema	`rules/pgvector-schema.md`	HNSW index + pre-computed tsvector
Hybrid Search	`rules/pgvector-hybrid-search.md`	SQLAlchemy RRF with FULL OUTER JOIN
Indexing	`rules/pgvector-indexing.md`	HNSW (17x faster) vs IVFFlat
Metadata	`rules/pgvector-metadata.md`	Filtering, boosting, Redis 8 comparison

Quick Start Example

python

from openai import OpenAI

client = OpenAI()

async def rag_query(question: str, top_k: int = 5) -> dict:
    """Basic RAG with citations."""
    docs = await vector_db.search(question, limit=top_k)
    context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(docs)])

    response = await llm.chat([
        {"role": "system", "content": "Answer with inline citations [1], [2]. Use ONLY provided context."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
    ])

    return {"answer": response.content, "sources": [d.metadata['source'] for d in docs]}

Key Decisions

Decision	Recommendation
Embedding model	`text-embedding-3-small` (general), `voyage-3` (production)
Chunk size	256-1024 tokens (512 typical)
Hybrid weight	40% BM25 / 60% vector
Top-k	3-10 documents
Temperature	0.1-0.3 (factual)
Context budget	4K-8K tokens
Reranking	Retrieve 50, rerank to 10
Vector index	HNSW (production), IVFFlat (high-volume)
HyDE timeout	2-3 seconds with fallback
Query decomposition	Heuristic first, LLM only if multi-concept

Common Mistakes

No citation tracking (unverifiable answers)
Context too large (dilutes relevance)
Single retrieval method (misses keyword matches)
Not chunking long documents (context gets lost)
Embedding queries differently than documents
No fallback path in agentic RAG (workflow hangs)
Infinite rewrite loops (no retry limit)
Using wrong similarity metric (cosine vs euclidean)
Not caching embeddings (recomputing unchanged content)
Missing image captions in multimodal RAG (limits text search)

Evaluations

See test-cases.json for 30 test cases across all categories.

Related Skills

ork:langgraph - LangGraph workflow patterns (for agentic RAG workflows)
caching - Cache RAG responses for repeated queries
ork:golden-dataset - Evaluate retrieval quality
ork:llm-integration - Local embeddings with nomic-embed-text
vision-language-models - Image analysis for multimodal RAG
ork:database-patterns - Schema design for vector search

Capability Details

retrieval-patterns

Keywords: retrieval, context, chunks, relevance, rag Solves:

Retrieve relevant context for LLM
Implement RAG pipeline with citations
Optimize retrieval quality

hybrid-search

Keywords: hybrid, bm25, vector, fusion, rrf Solves:

Combine keyword and semantic search
Implement reciprocal rank fusion
Balance precision and recall

embeddings

Keywords: embedding, text to vector, vectorize, chunk, similarity Solves:

Convert text to vector embeddings
Choose embedding models and dimensions
Implement chunking strategies

contextual-retrieval

Keywords: contextual, anthropic, context-prepend, bm25 Solves:

Prepend context to chunks for better retrieval
Reduce retrieval failures by 67%
Implement hybrid BM25+vector search

hyde

Keywords: hyde, hypothetical, vocabulary mismatch Solves:

Bridge vocabulary gaps in semantic search
Generate hypothetical documents for embedding
Handle abstract or conceptual queries

agentic-rag

Keywords: self-rag, crag, corrective, adaptive, grading Solves:

Build self-correcting RAG workflows
Grade document relevance
Implement web search fallback

multimodal-rag

Keywords: multimodal, image, clip, vision, pdf Solves:

Build RAG with images and text
Cross-modal search (text → image)
Process PDFs with mixed content

query-decomposition

Keywords: decompose, multi-concept, complex query Solves:

Break complex queries into concepts
Parallel retrieval per concept
Improve coverage for compound questions

reranking

Keywords: rerank, cross-encoder, precision, scoring Solves:

Improve search precision post-retrieval
Score relevance with cross-encoder or LLM
Combine multiple scoring signals

pgvector-search

Keywords: pgvector, postgresql, hnsw, tsvector, hybrid Solves:

Production hybrid search with PostgreSQL
HNSW vs IVFFlat index selection
SQL-based RRF fusion

Maintainer

yonatangross Core maintainer

Source details

Full Name: yonatangross/orchestkit
Branch: main
Path in repo: plugins/ork/skills/rag-retrieval
License: MIT License
Topics: claude-code mcp typescript agents llm react ai-development security rag langgraph testing claude-plugin fastapi

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

yonatangross/orchestkit

expect

Diff-aware AI browser testing — analyzes git changes, generates targeted test plans, and executes them via agent-browser. Reads git diff to determine what changed, maps changes to affected pages via route map, generates a test plan scoped to the diff, and runs it with pass/fail reporting. Use when testing UI changes, verifying PRs before merge, running regression checks on changed components, or validating that recent code changes don't break the user-facing experience.

143 15

Explore

yonatangross/orchestkit

github-operations

GitHub CLI operations for issues, PRs, milestones, and Projects v2. Covers gh commands, REST API patterns, and automation scripts. Use when managing GitHub issues, PRs, milestones, or Projects with gh.

143 15

Explore

yonatangross/orchestkit

chain-patterns

Chain patterns for CC 2.1.71 pipelines — MCP detection, handoff files, checkpoint-resume, worktree agents, CronCreate monitoring. Use when building multi-phase pipeline skills. Loaded via skills: field by pipeline skills (fix-issue, implement, brainstorm, verify). Not user-invocable.

143 15

Explore

yonatangross/orchestkit

storybook-mcp-integration

Storybook MCP server integration for component-aware AI development. Covers 6 tools across 3 toolsets (dev, docs, testing): component discovery via list-all-documentation/get-documentation, story previews via preview-stories, and automated testing via run-story-tests. Use when generating components that should reuse existing Storybook components, running component tests via MCP, or previewing stories in chat.

143 15

Explore

yonatangross/orchestkit

component-search

Search 21st.dev component registry for production-ready React components. Finds components by natural language description, filters by framework and style system, returns ranked results with install instructions. Use when looking for UI components, finding alternatives to existing components, or sourcing design system building blocks.

143 15

Explore

yonatangross/orchestkit

ai-ui-generation

AI-assisted UI generation patterns for json-render, v0, Bolt, and Cursor workflows. Covers prompt engineering for component generation, review checklists for AI-generated code, design token injection, refactoring for design system conformance, and CI gates for quality assurance. Use when generating UI components with AI tools, rendering multi-surface MCP visual output, reviewing AI-generated code, or integrating AI output into design systems.

143 15

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

RAG Retrieval

Quick Reference

Core RAG

Embeddings

Contextual Retrieval

HyDE

Agentic RAG

Multimodal RAG

Query Decomposition

Reranking

PGVector

Quick Start Example

Key Decisions

Common Mistakes

Evaluations

Related Skills

Capability Details

retrieval-patterns

hybrid-search

embeddings

contextual-retrieval

hyde

agentic-rag

multimodal-rag

query-decomposition

reranking

pgvector-search

Recommended Agent Skills

expect

github-operations

chain-patterns

storybook-mcp-integration

component-search

ai-ui-generation