Agent skill
langfuse
Debug AI traces, find exceptions, analyze sessions, and manage prompts via Langfuse MCP. Also handles MCP setup and configuration.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/langfuse
Metadata
Additional technical details for this skill
- compatibility
- claude-code, codex-cli
- short description
- Langfuse observability via MCP
SKILL.md
Langfuse Skill
Debug your AI systems through Langfuse observability.
Triggers: langfuse, traces, debug AI, find exceptions, set up langfuse, what went wrong, why is it slow, datasets, evaluation sets
Setup
Step 1: Get credentials from https://cloud.langfuse.com → Settings → API Keys
If self-hosted, use your instance URL for LANGFUSE_HOST and create keys there.
Step 2: Install MCP (pick one):
# Claude Code (project-scoped, shared via .mcp.json)
claude mcp add \
--scope project \
--env LANGFUSE_PUBLIC_KEY=pk-... \
--env LANGFUSE_SECRET_KEY=sk-... \
--env LANGFUSE_HOST=https://cloud.langfuse.com \
langfuse -- uvx --python 3.11 langfuse-mcp
# Codex CLI (user-scoped, stored in ~/.codex/config.toml)
codex mcp add langfuse \
--env LANGFUSE_PUBLIC_KEY=pk-... \
--env LANGFUSE_SECRET_KEY=sk-... \
--env LANGFUSE_HOST=https://cloud.langfuse.com \
-- uvx --python 3.11 langfuse-mcp
Step 3: Restart CLI, verify with /mcp (Claude) or codex mcp list (Codex)
Step 4: Test: fetch_traces(age=60)
Read-Only Mode
For safer observability without risk of modifying prompts or datasets, enable read-only mode:
# CLI flag
langfuse-mcp --read-only
# Or environment variable
LANGFUSE_MCP_READ_ONLY=true
This disables write tools: create_text_prompt, create_chat_prompt, update_prompt_labels, create_dataset, create_dataset_item, delete_dataset_item.
For manual .mcp.json setup or troubleshooting, see references/setup.md.
Playbooks
"Where are the errors?"
find_exceptions(age=1440, group_by="file")
→ Shows error counts by file. Pick the worst offender.
find_exceptions_in_file(filepath="src/ai/chat.py", age=1440)
→ Lists specific exceptions. Grab a trace_id.
get_exception_details(trace_id="...")
→ Full stacktrace and context.
"What happened in this interaction?"
fetch_traces(age=60, user_id="...")
→ Find the trace. Note the trace_id.
If you don't know the user_id, start with:
fetch_traces(age=60)
fetch_trace(trace_id="...", include_observations=true)
→ See all LLM calls in the trace.
fetch_observation(observation_id="...")
→ Inspect a specific generation's input/output.
"Why is it slow?"
fetch_observations(age=60, type="GENERATION")
→ Find recent LLM calls. Look for high latency.
fetch_observation(observation_id="...")
→ Check token counts, model, timing.
"What's this user experiencing?"
get_user_sessions(user_id="...", age=1440)
→ List their sessions.
get_session_details(session_id="...")
→ See all traces in the session.
"Manage datasets"
list_datasets()
→ See all datasets.
get_dataset(name="evaluation-set-v1")
→ Get dataset details.
list_dataset_items(dataset_name="evaluation-set-v1", page=1, limit=10)
→ Browse items in the dataset.
create_dataset(name="qa-test-cases", description="QA evaluation set")
→ Create a new dataset.
create_dataset_item(
dataset_name="qa-test-cases",
input={"question": "What is 2+2?"},
expected_output={"answer": "4"}
)
→ Add test cases.
create_dataset_item(
dataset_name="qa-test-cases",
item_id="item_123",
input={"question": "What is 3+3?"},
expected_output={"answer": "6"}
)
→ Upsert: updates existing item by id or creates if missing.
"Manage prompts"
list_prompts()
→ See all prompts with labels.
get_prompt(name="...", label="production")
→ Fetch current production version.
create_text_prompt(name="...", prompt="...", labels=["staging"])
→ Create new version in staging.
update_prompt_labels(name="...", version=N, labels=["production"])
→ Promote to production. (Rollback = re-apply label to older version)
Quick Reference
| Task | Tool |
|---|---|
| List traces | fetch_traces(age=N) |
| Get trace details | fetch_trace(trace_id="...", include_observations=true) |
| List LLM calls | fetch_observations(age=N, type="GENERATION") |
| Get observation | fetch_observation(observation_id="...") |
| Error count | get_error_count(age=N) |
| Find exceptions | find_exceptions(age=N, group_by="file") |
| List sessions | fetch_sessions(age=N) |
| User sessions | get_user_sessions(user_id="...", age=N) |
| List prompts | list_prompts() |
| Get prompt | get_prompt(name="...", label="production") |
| List datasets | list_datasets() |
| Get dataset | get_dataset(name="...") |
| List dataset items | list_dataset_items(dataset_name="...", limit=N) |
| Create/update dataset item | create_dataset_item(dataset_name="...", item_id="...") |
age = minutes to look back (max 10080 = 7 days)
References
references/tool-reference.md— Full parameter docs, filter semantics, response schemasreferences/setup.md— Manual setup, troubleshooting, advanced configuration
Didn't find tool you were looking for?