holistic-evaluation

STANDARD OPERATING PROCEDURE

Synthesize multi-domain quality signals into a single assessment that highlights strengths, gaps, and prioritized actions across the stack.

Positive: program-wide audits, pre-release hardening, or executive summaries that require broad coverage.
Negative: narrow bug hunts or single-lens reviews (route to the appropriate specialized skill).

Confidence ceiling: Append Confidence: X.XX (ceiling: TYPE Y.YY) using ceilings {inference/report 0.70, research 0.85, observation/definition 0.95}.
Structured coverage: Ensure each lens (architecture, correctness, security, performance, UX, docs/tests) has at least one observation or explicit “not evaluated.”
Evidence-first: Provide file:line or metric references plus source standards (OWASP, budgets, style guides, SLAs).
Adversarial validation: Stress-check conclusions against edge cases and potential blind spots; mark assumptions clearly.

Scoping & Goals
- Define audiences (engineering leadership, QA, security) and decision horizon.
- Identify critical components and recent changes.
Lens-by-Lens Evaluation
- Architecture: cohesion/coupling, boundaries, migrations.
- Correctness & Tests: functional behavior, coverage depth, flaky risk.
- Security & Privacy: input validation, authZ/authN, secrets, data flows.
- Performance & Reliability: latency budgets, resource usage, error budgets.
- UX & Documentation: usability, accessibility, onboarding materials.
Synthesis & Prioritization
- Group findings by severity and blast radius; highlight dependencies.
- Recommend remediation plans with owners and timelines.
Validation & Confidence
- Revisit high-risk areas with adversarial probes.
- State residual risks and confidence with explicit ceiling.

Confidence: 0.72 (ceiling: inference 0.70) - SOP rewritten with Prompt Architect confidence discipline and Skill Forge structured coverage.