Agent skill

senior-prompt-engineer

This skill should be used when the user asks to "optimize prompts", "design prompt templates", "evaluate LLM outputs", "build agentic systems", "implement RAG", "create few-shot examples", "analyze token usage", or "design AI workflows". Use for prompt engineering patterns, LLM evaluation frameworks, agent architectures, and structured output design.

Stars 71
Forks 21

Install this agent skill to your Project

npx add-skill https://github.com/borghei/Claude-Skills/tree/main/engineering/senior-prompt-engineer

Metadata

Additional technical details for this skill

tags
prompt-optimization llm-evaluation agents prompt-engineering
author
borghei
domain
prompt-engineering
updated
1774915200
version
1.0.0
category
engineering

SKILL.md

Senior Prompt Engineer

Prompt engineering patterns, LLM evaluation frameworks, and agentic system design.

Table of Contents

  • Quick Start
  • Tools Overview
    • Prompt Optimizer
    • RAG Evaluator
    • Agent Orchestrator
  • Prompt Engineering Workflows
    • Prompt Optimization Workflow
    • Few-Shot Example Design
    • Structured Output Design
  • Reference Documentation
  • Common Patterns Quick Reference

Quick Start

bash
# Analyze and optimize a prompt file
python scripts/prompt_optimizer.py prompts/my_prompt.txt --analyze

# Evaluate RAG retrieval quality
python scripts/rag_evaluator.py --contexts contexts.json --questions questions.json

# Visualize agent workflow from definition
python scripts/agent_orchestrator.py agent_config.yaml --visualize

Tools Overview

1. Prompt Optimizer

Analyzes prompts for token efficiency, clarity, and structure. Generates optimized versions.

Input: Prompt text file or string Output: Analysis report with optimization suggestions

Usage:

bash
# Analyze a prompt file
python scripts/prompt_optimizer.py prompt.txt --analyze

# Output:
# Token count: 847
# Estimated cost: $0.0025 (GPT-4)
# Clarity score: 72/100
# Issues found:
#   - Ambiguous instruction at line 3
#   - Missing output format specification
#   - Redundant context (lines 12-15 repeat lines 5-8)
# Suggestions:
#   1. Add explicit output format: "Respond in JSON with keys: ..."
#   2. Remove redundant context to save 89 tokens
#   3. Clarify "analyze" -> "list the top 3 issues with severity ratings"

# Generate optimized version
python scripts/prompt_optimizer.py prompt.txt --optimize --output optimized.txt

# Count tokens for cost estimation
python scripts/prompt_optimizer.py prompt.txt --tokens --model gpt-4

# Extract and manage few-shot examples
python scripts/prompt_optimizer.py prompt.txt --extract-examples --output examples.json

2. RAG Evaluator

Evaluates Retrieval-Augmented Generation quality by measuring context relevance and answer faithfulness.

Input: Retrieved contexts (JSON) and questions/answers Output: Evaluation metrics and quality report

Usage:

bash
# Evaluate retrieval quality
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json

# Output:
# === RAG Evaluation Report ===
# Questions evaluated: 50
#
# Retrieval Metrics:
#   Context Relevance: 0.78 (target: >0.80)
#   Retrieval Precision@5: 0.72
#   Coverage: 0.85
#
# Generation Metrics:
#   Answer Faithfulness: 0.91
#   Groundedness: 0.88
#
# Issues Found:
#   - 8 questions had no relevant context in top-5
#   - 3 answers contained information not in context
#
# Recommendations:
#   1. Improve chunking strategy for technical documents
#   2. Add metadata filtering for date-sensitive queries

# Evaluate with custom metrics
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json \
    --metrics relevance,faithfulness,coverage

# Export detailed results
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json \
    --output report.json --verbose

3. Agent Orchestrator

Parses agent definitions and visualizes execution flows. Validates tool configurations.

Input: Agent configuration (YAML/JSON) Output: Workflow visualization, validation report

Usage:

bash
# Validate agent configuration
python scripts/agent_orchestrator.py agent.yaml --validate

# Output:
# === Agent Validation Report ===
# Agent: research_assistant
# Pattern: ReAct
#
# Tools (4 registered):
#   [OK] web_search - API key configured
#   [OK] calculator - No config needed
#   [WARN] file_reader - Missing allowed_paths
#   [OK] summarizer - Prompt template valid
#
# Flow Analysis:
#   Max depth: 5 iterations
#   Estimated tokens/run: 2,400-4,800
#   Potential infinite loop: No
#
# Recommendations:
#   1. Add allowed_paths to file_reader for security
#   2. Consider adding early exit condition for simple queries

# Visualize agent workflow (ASCII)
python scripts/agent_orchestrator.py agent.yaml --visualize

# Output:
# ┌─────────────────────────────────────────┐
# │            research_assistant           │
# │              (ReAct Pattern)            │
# └─────────────────┬───────────────────────┘
#                   │
#          ┌────────▼────────┐
#          │   User Query    │
#          └────────┬────────┘
#                   │
#          ┌────────▼────────┐
#          │     Think       │◄──────┐
#          └────────┬────────┘       │
#                   │                │
#          ┌────────▼────────┐       │
#          │   Select Tool   │       │
#          └────────┬────────┘       │
#                   │                │
#     ┌─────────────┼─────────────┐  │
#     ▼             ▼             ▼  │
# [web_search] [calculator] [file_reader]
#     │             │             │  │
#     └─────────────┼─────────────┘  │
#                   │                │
#          ┌────────▼────────┐       │
#          │    Observe      │───────┘
#          └────────┬────────┘
#                   │
#          ┌────────▼────────┐
#          │  Final Answer   │
#          └─────────────────┘

# Export workflow as Mermaid diagram
python scripts/agent_orchestrator.py agent.yaml --visualize --format mermaid

Prompt Engineering Workflows

Prompt Optimization Workflow

Use when improving an existing prompt's performance or reducing token costs.

Step 1: Baseline current prompt

bash
python scripts/prompt_optimizer.py current_prompt.txt --analyze --output baseline.json

Step 2: Identify issues Review the analysis report for:

  • Token waste (redundant instructions, verbose examples)
  • Ambiguous instructions (unclear output format, vague verbs)
  • Missing constraints (no length limits, no format specification)

Step 3: Apply optimization patterns

Issue Pattern to Apply
Ambiguous output Add explicit format specification
Too verbose Extract to few-shot examples
Inconsistent results Add role/persona framing
Missing edge cases Add constraint boundaries

Step 4: Generate optimized version

bash
python scripts/prompt_optimizer.py current_prompt.txt --optimize --output optimized.txt

Step 5: Compare results

bash
python scripts/prompt_optimizer.py optimized.txt --analyze --compare baseline.json
# Shows: token reduction, clarity improvement, issues resolved

Step 6: Validate with test cases Run both prompts against your evaluation set and compare outputs.


Few-Shot Example Design Workflow

Use when creating examples for in-context learning.

Step 1: Define the task clearly

Task: Extract product entities from customer reviews
Input: Review text
Output: JSON with {product_name, sentiment, features_mentioned}

Step 2: Select diverse examples (3-5 recommended)

Example Type Purpose
Simple case Shows basic pattern
Edge case Handles ambiguity
Complex case Multiple entities
Negative case What NOT to extract

Step 3: Format consistently

Example 1:
Input: "Love my new iPhone 15, the camera is amazing!"
Output: {"product_name": "iPhone 15", "sentiment": "positive", "features_mentioned": ["camera"]}

Example 2:
Input: "The laptop was okay but battery life is terrible."
Output: {"product_name": "laptop", "sentiment": "mixed", "features_mentioned": ["battery life"]}

Step 4: Validate example quality

bash
python scripts/prompt_optimizer.py prompt_with_examples.txt --validate-examples
# Checks: consistency, coverage, format alignment

Step 5: Test with held-out cases Ensure model generalizes beyond your examples.


Structured Output Design Workflow

Use when you need reliable JSON/XML/structured responses.

Step 1: Define schema

json
{
  "type": "object",
  "properties": {
    "summary": {"type": "string", "maxLength": 200},
    "sentiment": {"enum": ["positive", "negative", "neutral"]},
    "confidence": {"type": "number", "minimum": 0, "maximum": 1}
  },
  "required": ["summary", "sentiment"]
}

Step 2: Include schema in prompt

Respond with JSON matching this schema:
- summary (string, max 200 chars): Brief summary of the content
- sentiment (enum): One of "positive", "negative", "neutral"
- confidence (number 0-1): Your confidence in the sentiment

Step 3: Add format enforcement

IMPORTANT: Respond ONLY with valid JSON. No markdown, no explanation.
Start your response with { and end with }

Step 4: Validate outputs

bash
python scripts/prompt_optimizer.py structured_prompt.txt --validate-schema schema.json

Reference Documentation

File Contains Load when user asks about
references/prompt_engineering_patterns.md 10 prompt patterns with input/output examples "which pattern?", "few-shot", "chain-of-thought", "role prompting"
references/llm_evaluation_frameworks.md Evaluation metrics, scoring methods, A/B testing "how to evaluate?", "measure quality", "compare prompts"
references/agentic_system_design.md Agent architectures (ReAct, Plan-Execute, Tool Use) "build agent", "tool calling", "multi-agent"

Common Patterns Quick Reference

Pattern When to Use Example
Zero-shot Simple, well-defined tasks "Classify this email as spam or not spam"
Few-shot Complex tasks, consistent format needed Provide 3-5 examples before the task
Chain-of-Thought Reasoning, math, multi-step logic "Think step by step..."
Role Prompting Expertise needed, specific perspective "You are an expert tax accountant..."
Structured Output Need parseable JSON/XML Include schema + format enforcement

Common Commands

bash
# Prompt Analysis
python scripts/prompt_optimizer.py prompt.txt --analyze          # Full analysis
python scripts/prompt_optimizer.py prompt.txt --tokens           # Token count only
python scripts/prompt_optimizer.py prompt.txt --optimize         # Generate optimized version

# RAG Evaluation
python scripts/rag_evaluator.py --contexts ctx.json --questions q.json  # Evaluate
python scripts/rag_evaluator.py --contexts ctx.json --compare baseline  # Compare to baseline

# Agent Development
python scripts/agent_orchestrator.py agent.yaml --validate       # Validate config
python scripts/agent_orchestrator.py agent.yaml --visualize      # Show workflow
python scripts/agent_orchestrator.py agent.yaml --estimate-cost  # Token estimation

Troubleshooting

Problem Cause Solution
Token count seems inaccurate Character-based estimation varies by language and special characters Use --model flag matching your target model; Claude uses a 3.5 char/token ratio vs 4.0 for GPT models
Clarity score is low despite clear prompt Vague-pattern detector flags common words like "analyze" or "some" even in valid contexts Review flagged lines individually; not every match is a true issue --- focus on genuinely ambiguous instructions
Few-shot examples not detected Examples do not follow the Input:/Output: or Example N: labeling convention Format examples with explicit Input: and Output: prefixes so the extractor can parse them
RAG evaluator shows 0.0 for all metrics Input JSON schema mismatch --- missing question, content, or question_id keys Verify JSON uses the expected keys (question/query, content/text, question_id/query_id)
Agent YAML parsing fails Built-in YAML parser is simplified and cannot handle advanced syntax (anchors, multi-line blocks) Convert config to JSON, or restructure YAML to use only simple key-value pairs and dash-prefixed lists
Optimization produces minimal changes --optimize only performs whitespace normalization, not semantic rewriting Use --analyze first to get suggestions, then manually apply structural improvements before re-running --optimize
Mermaid diagram renders incorrectly More than 6 tools overflow the generated subgraph Reduce tool count in the config or manually edit the Mermaid output to split into sub-diagrams

Success Criteria

  • Prompt clarity score above 70/100 on all production prompts, measured via prompt_optimizer.py --analyze
  • Token efficiency improved by 30%+ after applying optimization suggestions and removing redundant content
  • RAG context relevance at or above 0.80 across evaluation sets, verified by rag_evaluator.py
  • Answer faithfulness at or above 0.95 with zero unsupported claims in critical workflows
  • Agent validation passes with zero errors for all deployed agent configurations
  • Cost per agent run within budget --- estimated monthly spend confirmed via agent_orchestrator.py --estimate-cost
  • Few-shot example coverage includes edge cases --- at least 1 simple, 1 complex, and 1 negative example per prompt template

Scope & Limitations

This skill covers:

  • Static prompt analysis: token counting, clarity scoring, structure detection, and optimization suggestions
  • RAG evaluation: context relevance, answer faithfulness, groundedness, and retrieval metrics (Precision@K, ROUGE-L, MRR, NDCG)
  • Agent workflow design: configuration validation, ASCII/Mermaid visualization, and token cost estimation
  • Few-shot example extraction and management from existing prompts

This skill does NOT cover:

  • Live LLM calls or runtime prompt testing --- all analysis is static/deterministic (see senior-ml-engineer for LLM integration)
  • Vector database setup or embedding generation --- RAG evaluator scores pre-retrieved contexts only (see senior-data-engineer for pipeline orchestration)
  • Fine-tuning, RLHF, or model training workflows (see senior-ml-engineer for model deployment)
  • Production monitoring, A/B test execution, or real-time drift detection (see senior-data-scientist for experiment design)

Integration Points

Skill Integration Data Flow
senior-ml-engineer LLM integration and model deployment Optimized prompts from this skill feed into llm_integration_builder.py prompt templates
senior-data-scientist A/B test design for prompt experiments experiment_designer.py defines test parameters; this skill provides the prompt variants to compare
senior-data-engineer RAG pipeline orchestration pipeline_orchestrator.py builds the retrieval pipeline; this skill evaluates its output quality
senior-fullstack End-to-end application scaffolding Fullstack apps consume agent configs validated by agent_orchestrator.py
senior-security Prompt injection and adversarial input review Security analysis covers the attack surface; this skill ensures prompts include defensive constraints
senior-qa Quality assurance for AI-powered features QA test suites validate that optimized prompts produce consistent outputs in production

Tool Reference

prompt_optimizer.py

Purpose: Static analysis tool for prompt engineering. Estimates token counts, scores clarity and structure, detects ambiguous instructions and redundant content, extracts few-shot examples, and generates optimized prompt versions.

Usage:

bash
python scripts/prompt_optimizer.py <prompt_file> [options]

Parameters:

Flag Short Type Default Description
prompt (positional) string (required) Path to the prompt text file to analyze
--analyze -a flag off Run full analysis (clarity, structure, issues, suggestions)
--tokens -t flag off Count tokens and estimate cost only
--optimize -O flag off Generate whitespace-optimized version of the prompt
--extract-examples -e flag off Extract few-shot examples (Input/Output pairs) as JSON
--model -m choice gpt-4 Model for token/cost estimation. Choices: gpt-4, gpt-4-turbo, gpt-3.5-turbo, claude-3-opus, claude-3-sonnet, claude-3-haiku
--output -o string (none) Write results to this file path
--json -j flag off Output analysis as JSON instead of human-readable report
--compare -c string (none) Path to a baseline analysis JSON file for comparison

Example:

bash
python scripts/prompt_optimizer.py prompt.txt --analyze --model claude-3-sonnet --json

Output Formats:

  • Default (text): Human-readable report with metrics, scores, detected sections, issues, and suggestions
  • JSON (--json): Structured PromptAnalysis object with keys: token_count, estimated_cost, model, clarity_score, structure_score, issues, suggestions, sections, has_examples, example_count, has_output_format, word_count, line_count
  • Token-only (--tokens): Single-line token count and cost estimate
  • Examples (--extract-examples): JSON array of {input_text, output_text, index} objects
  • Optimized (--optimize): Cleaned prompt text with normalized whitespace

rag_evaluator.py

Purpose: Evaluates Retrieval-Augmented Generation quality by measuring context relevance (lexical overlap, term coverage), answer faithfulness (claim-level verification), groundedness (ROUGE-L), and retrieval metrics (Precision@K, MRR, NDCG).

Usage:

bash
python scripts/rag_evaluator.py --contexts <contexts.json> --questions <questions.json> [options]

Parameters:

Flag Short Type Default Description
--contexts -c string (required) Path to JSON file with retrieved contexts. Expected keys per object: question_id/query_id, content/text
--questions -q string (required) Path to JSON file with questions and answers. Expected keys per object: id, question/query, answer/response, expected/ground_truth
--k int 5 Number of top contexts to evaluate per question
--output -o string (none) Write detailed report to this JSON file
--json -j flag off Output as JSON instead of human-readable text
--verbose -v flag off Include per-question detail breakdowns in the report
--compare string (none) Path to a baseline report JSON for metric comparison

Example:

bash
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json --k 10 --verbose --output report.json

Output Formats:

  • Default (text): Human-readable report with summary, retrieval metrics (context relevance, Precision@K), generation metrics (faithfulness, groundedness), issues, and recommendations
  • JSON (--json): Structured RAGEvaluationReport object with keys: total_questions, avg_context_relevance, avg_faithfulness, avg_groundedness, retrieval_metrics, coverage, issues, recommendations, question_details
  • Verbose (--verbose): Adds per-question question_details array containing individual context scores and faithfulness breakdowns

agent_orchestrator.py

Purpose: Parses agent configurations (YAML or JSON), validates tool registrations and flow correctness, generates ASCII or Mermaid workflow diagrams, and estimates token costs per run and monthly spend.

Usage:

bash
python scripts/agent_orchestrator.py <config_file> [options]

Parameters:

Flag Short Type Default Description
config (positional) string (required) Path to agent configuration file (YAML or JSON)
--validate -V flag off Validate agent configuration (errors, warnings, tool status). Runs by default if no other action is specified
--visualize -v flag off Generate workflow diagram
--format -f choice ascii Visualization format. Choices: ascii, mermaid
--estimate-cost -e flag off Estimate token usage and costs
--runs -r int 100 Daily run count for monthly cost projection
--output -o string (none) Write output to this file path
--json -j flag off Output validation and cost results as JSON

Example:

bash
python scripts/agent_orchestrator.py agent.yaml --validate --visualize --format mermaid --output workflow.md

Output Formats:

  • Validation (text): Agent info, tool status with OK/WARN indicators, flow analysis (max iterations, token estimate, loop detection), errors, and warnings
  • Validation (JSON, --json): Structured ValidationResult object with keys: is_valid, errors, warnings, tool_status, estimated_tokens_per_run, potential_infinite_loop, max_depth
  • Visualization (--visualize): ASCII box-drawing diagram (default) or Mermaid flowchart (--format mermaid) showing the agent pattern flow and registered tools
  • Cost estimation (--estimate-cost): Token range per run, cost range per run, and projected monthly cost at the specified daily run rate

Expand your agent's capabilities with these related and highly-rated skills.

borghei/Claude-Skills

churn-prevention

SaaS churn reduction covering cancel flow design, dynamic save offers, exit survey architecture, dunning sequences, payment recovery, win-back campaigns, and churn impact modeling.

71 21
Explore
borghei/Claude-Skills

popup-cro

Popup and modal optimization for conversion. Covers exit-intent, slide-ins, banners, timing optimization, frequency capping, audience targeting, compliance, and A/B testing frameworks for lead capture, promotions, and announcements.

71 21
Explore
borghei/Claude-Skills

competitor-alternatives

Competitor comparison and alternative page creation for SEO and sales enablement. Covers 4 page formats (singular alternative, plural alternatives, vs pages, competitor vs competitor), content architecture, research methodology, and centralized competitor data management.

71 21
Explore
borghei/Claude-Skills

contract-and-proposal-writer

Generate production-ready business documents including freelance contracts, project proposals, SOWs, NDAs, and MSAs with jurisdiction-aware clauses. Covers US (Delaware), EU (GDPR), UK, and DACH (German law) legal frameworks. Includes contract templates, clause libraries, and DOCX conversion. Use when starting client engagements, writing proposals, drafting partnership agreements, or needing GDPR-compliant data processing addenda.

71 21
Explore
borghei/Claude-Skills

pricing-strategy

SaaS pricing design and optimization covering value metric selection, tier architecture, price point research, pricing page design, price increase execution, and competitive pricing analysis.

71 21
Explore
borghei/Claude-Skills

referral-program

Referral and affiliate program design covering referral loop architecture, incentive design, trigger moment optimization, viral coefficient modeling, affiliate program structure, and optimization playbook.

71 21
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results