Agent skill

qmd

Search personal knowledge bases, notes, docs, and meeting transcripts locally using qmd — a hybrid retrieval engine with BM25, vector search, and LLM reranking. Supports CLI and MCP integration.

View SKILL.md on GitHub Repository

Stars 56,643

Forks 7,481

Install this agent skill to your Project

npx add-skill https://github.com/NousResearch/hermes-agent/tree/main/optional-skills/research/qmd

Metadata

Additional technical details for this skill

hermes: { "tags": [ "Search", "Knowledge-Base", "RAG", "Notes", "MCP", "Local-AI" ], "related_skills": [ "obsidian", "native-mcp", "arxiv" ] }

SKILL.md

QMD — Query Markup Documents

Local, on-device search engine for personal knowledge bases. Indexes markdown notes, meeting transcripts, documentation, and any text-based files, then provides hybrid search combining keyword matching, semantic understanding, and LLM-powered reranking — all running locally with no cloud dependencies.

Created by Tobi Lütke. MIT licensed.

When to Use

User asks to search their notes, docs, knowledge base, or meeting transcripts
User wants to find something across a large collection of markdown/text files
User wants semantic search ("find notes about X concept") not just keyword grep
User has already set up qmd collections and wants to query them
User asks to set up a local knowledge base or document search system
Keywords: "search my notes", "find in my docs", "knowledge base", "qmd"

Prerequisites

Node.js >= 22 (required)

bash

# Check version
node --version  # must be >= 22

# macOS — install or upgrade via Homebrew
brew install node@22

# Linux — use NodeSource or nvm
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt-get install -y nodejs
# or with nvm:
nvm install 22 && nvm use 22

SQLite with Extension Support (macOS only)

macOS system SQLite lacks extension loading. Install via Homebrew:

bash

brew install sqlite

Install qmd

bash

npm install -g @tobilu/qmd
# or with Bun:
bun install -g @tobilu/qmd

First run auto-downloads 3 local GGUF models (~2GB total):

Model	Purpose	Size
embeddinggemma-300M-Q8_0	Vector embeddings	~300MB
qwen3-reranker-0.6b-q8_0	Result reranking	~640MB
qmd-query-expansion-1.7B	Query expansion	~1.1GB

Verify Installation

bash

qmd --version
qmd status

Quick Reference

Command	What It Does	Speed
`qmd search "query"`	BM25 keyword search (no models)	~0.2s
`qmd vsearch "query"`	Semantic vector search (1 model)	~3s
`qmd query "query"`	Hybrid + reranking (all 3 models)	~2-3s warm, ~19s cold
`qmd get <docid>`	Retrieve full document content	instant
`qmd multi-get "glob"`	Retrieve multiple files	instant
`qmd collection add <path> --name <n>`	Add a directory as a collection	instant
`qmd context add <path> "description"`	Add context metadata to improve retrieval	instant
`qmd embed`	Generate/update vector embeddings	varies
`qmd status`	Show index health and collection info	instant
`qmd mcp`	Start MCP server (stdio)	persistent
`qmd mcp --http --daemon`	Start MCP server (HTTP, warm models)	persistent

Setup Workflow

1. Add Collections

Point qmd at directories containing your documents:

bash

# Add a notes directory
qmd collection add ~/notes --name notes

# Add project docs
qmd collection add ~/projects/myproject/docs --name project-docs

# Add meeting transcripts
qmd collection add ~/meetings --name meetings

# List all collections
qmd collection list

2. Add Context Descriptions

Context metadata helps the search engine understand what each collection contains. This significantly improves retrieval quality:

bash

qmd context add qmd://notes "Personal notes, ideas, and journal entries"
qmd context add qmd://project-docs "Technical documentation for the main project"
qmd context add qmd://meetings "Meeting transcripts and action items from team syncs"

3. Generate Embeddings

bash

qmd embed

This processes all documents in all collections and generates vector embeddings. Re-run after adding new documents or collections.

4. Verify

bash

qmd status   # shows index health, collection stats, model info

Search Patterns

Fast Keyword Search (BM25)

Best for: exact terms, code identifiers, names, known phrases. No models loaded — near-instant results.

bash

qmd search "authentication middleware"
qmd search "handleError async"

Semantic Vector Search

Best for: natural language questions, conceptual queries. Loads embedding model (~3s first query).

bash

qmd vsearch "how does the rate limiter handle burst traffic"
qmd vsearch "ideas for improving onboarding flow"

Hybrid Search with Reranking (Best Quality)

Best for: important queries where quality matters most. Uses all 3 models — query expansion, parallel BM25+vector, reranking.

bash

qmd query "what decisions were made about the database migration"

Structured Multi-Mode Queries

Combine different search types in a single query for precision:

bash

# BM25 for exact term + vector for concept
qmd query $'lex: rate limiter\nvec: how does throttling work under load'

# With query expansion
qmd query $'expand: database migration plan\nlex: "schema change"'

Query Syntax (lex/BM25 mode)

Syntax	Effect	Example
`term`	Prefix match	`perf` matches "performance"
`"phrase"`	Exact phrase	`"rate limiter"`
`-term`	Exclude term	`performance -sports`

HyDE (Hypothetical Document Embeddings)

For complex topics, write what you expect the answer to look like:

bash

qmd query $'hyde: The migration plan involves three phases. First, we add the new columns without dropping the old ones. Then we backfill data. Finally we cut over and remove legacy columns.'

Scoping to Collections

bash

qmd search "query" --collection notes
qmd query "query" --collection project-docs

Output Formats

bash

qmd search "query" --json        # JSON output (best for parsing)
qmd search "query" --limit 5     # Limit results
qmd get "#abc123"                # Get by document ID
qmd get "path/to/file.md"       # Get by file path
qmd get "file.md:50" -l 100     # Get specific line range
qmd multi-get "journals/*.md" --json  # Batch retrieve by glob

MCP Integration (Recommended)

qmd exposes an MCP server that provides search tools directly to Hermes Agent via the native MCP client. This is the preferred integration — once configured, the agent gets qmd tools automatically without needing to load this skill.

Option A: Stdio Mode (Simple)

Add to ~/.hermes/config.yaml:

yaml

mcp_servers:
  qmd:
    command: "qmd"
    args: ["mcp"]
    timeout: 30
    connect_timeout: 45

This registers tools: mcp_qmd_search, mcp_qmd_vsearch, mcp_qmd_deep_search, mcp_qmd_get, mcp_qmd_status.

Tradeoff: Models load on first search call (~19s cold start), then stay warm for the session. Acceptable for occasional use.

Option B: HTTP Daemon Mode (Fast, Recommended for Heavy Use)

Start the qmd daemon separately — it keeps models warm in memory:

bash

# Start daemon (persists across agent restarts)
qmd mcp --http --daemon

# Runs on http://localhost:8181 by default

Then configure Hermes Agent to connect via HTTP:

yaml

mcp_servers:
  qmd:
    url: "http://localhost:8181/mcp"
    timeout: 30

Tradeoff: Uses ~2GB RAM while running, but every query is fast (~2-3s). Best for users who search frequently.

Keeping the Daemon Running

macOS (launchd)

bash

cat > ~/Library/LaunchAgents/com.qmd.daemon.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.qmd.daemon</string>
  <key>ProgramArguments</key>
  <array>
    <string>qmd</string>
    <string>mcp</string>
    <string>--http</string>
    <string>--daemon</string>
  </array>
  <key>RunAtLoad</key>
  <true/>
  <key>KeepAlive</key>
  <true/>
  <key>StandardOutPath</key>
  <string>/tmp/qmd-daemon.log</string>
  <key>StandardErrorPath</key>
  <string>/tmp/qmd-daemon.log</string>
</dict>
</plist>
EOF

launchctl load ~/Library/LaunchAgents/com.qmd.daemon.plist

Linux (systemd user service)

bash

mkdir -p ~/.config/systemd/user

cat > ~/.config/systemd/user/qmd-daemon.service << 'EOF'
[Unit]
Description=QMD MCP Daemon
After=network.target

[Service]
ExecStart=qmd mcp --http --daemon
Restart=on-failure
RestartSec=10
Environment=PATH=/usr/local/bin:/usr/bin:/bin

[Install]
WantedBy=default.target
EOF

systemctl --user daemon-reload
systemctl --user enable --now qmd-daemon
systemctl --user status qmd-daemon

MCP Tools Reference

Once connected, these tools are available as mcp_qmd_*:

MCP Tool	Maps To	Description
`mcp_qmd_search`	`qmd search`	BM25 keyword search
`mcp_qmd_vsearch`	`qmd vsearch`	Semantic vector search
`mcp_qmd_deep_search`	`qmd query`	Hybrid search + reranking
`mcp_qmd_get`	`qmd get`	Retrieve document by ID or path
`mcp_qmd_status`	`qmd status`	Index health and stats

The MCP tools accept structured JSON queries for multi-mode search:

json

{
  "searches": [
    {"type": "lex", "query": "authentication middleware"},
    {"type": "vec", "query": "how user login is verified"}
  ],
  "collections": ["project-docs"],
  "limit": 10
}

CLI Usage (Without MCP)

When MCP is not configured, use qmd directly via terminal:

terminal(command="qmd query 'what was decided about the API redesign' --json", timeout=30)

For setup and management tasks, always use terminal:

terminal(command="qmd collection add ~/Documents/notes --name notes")
terminal(command="qmd context add qmd://notes 'Personal research notes and ideas'")
terminal(command="qmd embed")
terminal(command="qmd status")

How the Search Pipeline Works

Understanding the internals helps choose the right search mode:

Query Expansion — A fine-tuned 1.7B model generates 2 alternative queries. The original gets 2x weight in fusion.
Parallel Retrieval — BM25 (SQLite FTS5) and vector search run simultaneously across all query variants.
RRF Fusion — Reciprocal Rank Fusion (k=60) merges results. Top-rank bonus: #1 gets +0.05, #2-3 get +0.02.
LLM Reranking — qwen3-reranker scores top 30 candidates (0.0-1.0).
Position-Aware Blending — Ranks 1-3: 75% retrieval / 25% reranker. Ranks 4-10: 60/40. Ranks 11+: 40/60 (trusts reranker more for long tail).

Smart Chunking: Documents are split at natural break points (headings, code blocks, blank lines) targeting ~900 tokens with 15% overlap. Code blocks are never split mid-block.

Best Practices

Always add context descriptions — qmd context add dramatically improves retrieval accuracy. Describe what each collection contains.
Re-embed after adding documents — qmd embed must be re-run when new files are added to collections.
Use qmd search for speed — when you need fast keyword lookup (code identifiers, exact names), BM25 is instant and needs no models.
Use qmd query for quality — when the question is conceptual or the user needs the best possible results, use hybrid search.
Prefer MCP integration — once configured, the agent gets native tools without needing to load this skill each time.
Daemon mode for frequent users — if the user searches their knowledge base regularly, recommend the HTTP daemon setup.
First query in structured search gets 2x weight — put the most important/certain query first when combining lex and vec.

Troubleshooting

"Models downloading on first run"

Normal — qmd auto-downloads ~2GB of GGUF models on first use. This is a one-time operation.

Cold start latency (~19s)

This happens when models aren't loaded in memory. Solutions:

Use HTTP daemon mode (qmd mcp --http --daemon) to keep warm
Use qmd search (BM25 only) when models aren't needed
MCP stdio mode loads models on first search, stays warm for session

macOS: "unable to load extension"

Install Homebrew SQLite: brew install sqlite Then ensure it's on PATH before system SQLite.

"No collections found"

Run qmd collection add <path> --name <name> to add directories, then qmd embed to index them.

Embedding model override (CJK/multilingual)

Set QMD_EMBED_MODEL environment variable for non-English content:

bash

export QMD_EMBED_MODEL="your-multilingual-model"

Data Storage

Index & vectors: ~/.cache/qmd/index.sqlite
Models: Auto-downloaded to local cache on first run
No cloud dependencies — everything runs locally

References

Maintainer

NousResearch Core maintainer

Source details

Full Name: NousResearch/hermes-agent
Branch: main
Path in repo: optional-skills/research/qmd
License: MIT License
Topics: ai claude-code anthropic claude ai-agents clawdbot llm openclaw ai-agent codex chatgpt moltbot openai hermes hermes-agent nous-research

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

NousResearch/hermes-agent

agentmail

Give the agent its own dedicated email inbox via AgentMail. Send, receive, and manage email autonomously using agent-owned email addresses (e.g. hermes-agent@agentmail.to).

56,643 7,481

Explore

NousResearch/hermes-agent

base

Query Base (Ethereum L2) blockchain data with USD pricing — wallet balances, token info, transaction details, gas analysis, contract inspection, whale detection, and live network stats. Uses Base RPC + CoinGecko. No API key required.

56,643 7,481

Explore

NousResearch/hermes-agent

solana

Query Solana blockchain data with USD pricing — wallet balances, token portfolios with values, transaction details, NFTs, whale detection, and live network stats. Uses Solana RPC + CoinGecko. No API key required.

56,643 7,481

Explore

NousResearch/hermes-agent

one-three-one-rule

Structured decision-making framework for technical proposals and trade-off analysis. When the user faces a choice between multiple approaches (architecture decisions, tool selection, refactoring strategies, migration paths), this skill produces a 1-3-1 format: one clear problem statement, three distinct options with pros/cons, and one concrete recommendation with definition of done and implementation plan. Use when the user asks for a "1-3-1", says "give me options", or needs help choosing between competing approaches.

56,643 7,481

Explore

NousResearch/hermes-agent

fastmcp

Build, test, inspect, install, and deploy MCP servers with FastMCP in Python. Use when creating a new MCP server, wrapping an API or database as MCP tools, exposing resources or prompts, or preparing a FastMCP server for Claude Code, Cursor, or HTTP deployment.

56,643 7,481

Explore

NousResearch/hermes-agent

qdrant-vector-search

High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance.

56,643 7,481

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

QMD — Query Markup Documents

When to Use

Prerequisites

Node.js >= 22 (required)

SQLite with Extension Support (macOS only)

Install qmd

Verify Installation

Quick Reference

Setup Workflow

1. Add Collections

2. Add Context Descriptions

3. Generate Embeddings

4. Verify

Search Patterns

Fast Keyword Search (BM25)

Semantic Vector Search

Hybrid Search with Reranking (Best Quality)

Structured Multi-Mode Queries

Query Syntax (lex/BM25 mode)

HyDE (Hypothetical Document Embeddings)

Scoping to Collections

Output Formats

MCP Integration (Recommended)

Option A: Stdio Mode (Simple)

Option B: HTTP Daemon Mode (Fast, Recommended for Heavy Use)

Keeping the Daemon Running

macOS (launchd)

Linux (systemd user service)

MCP Tools Reference

CLI Usage (Without MCP)

How the Search Pipeline Works

Best Practices

Troubleshooting

"Models downloading on first run"

Cold start latency (~19s)

macOS: "unable to load extension"

"No collections found"

Embedding model override (CJK/multilingual)

Data Storage

References

Recommended Agent Skills

agentmail

base

solana

one-three-one-rule

fastmcp

qdrant-vector-search