Criticality-Confidence System Skill

Purpose

This Skill provides comprehensive guidance on the criticality-confidence classification system used by all checker and fixer agents in the repository.

When to use this Skill:

Implementing checker agents (categorizing findings)
Implementing fixer agents (assessing confidence, determining priority)
Processing audit reports
Understanding priority execution order (P0-P4)
Working with dual-label patterns (verification + criticality)
Writing standardized audit reports

Core Concepts

Two Orthogonal Dimensions

The system uses TWO independent dimensions:

Criticality (CRITICAL/HIGH/MEDIUM/LOW):

Measures importance and urgency
Answers: "How soon must this be fixed?"
Set by checker agents during validation
Objective criteria based on impact

Confidence (HIGH/MEDIUM/FALSE_POSITIVE):

Measures certainty and fixability
Answers: "How certain are we this needs fixing?"
Assessed by fixer agents during re-validation
Based on re-validation results

Key Insight: These dimensions are ORTHOGONAL - they measure different things and combine to determine priority.

Four Criticality Levels

🔴 CRITICAL - Breaks functionality, blocks users, violates mandatory requirements

Missing required fields (build breaks)
Broken internal links (404 errors)
Security vulnerabilities
Syntax errors preventing execution
MUST requirement violations

🟠 HIGH - Significant quality degradation, convention violations

Wrong format (system functions but non-compliant)
Accessibility violations (WCAG AA failures)
SHOULD requirement violations
Incorrect link format (works but violates convention)

🟡 MEDIUM - Minor quality issues, style inconsistencies

Missing optional fields (minimal impact)
Formatting inconsistencies
Suboptimal structure (still functional)
MAY/OPTIONAL requirement deviations

🟢 LOW - Suggestions, optimizations, enhancements

Performance optimizations
Alternative implementation suggestions
Future-proofing recommendations
Best practice suggestions (not requirements)

Three Confidence Levels

HIGH - Objectively correct, safe to auto-fix

Re-validation confirms issue exists
Issue is objective and verifiable
Fix is straightforward and safe
No ambiguity, low risk

MEDIUM - Uncertain, requires manual review

Re-validation is unclear or ambiguous
Issue is subjective (human judgment needed)
Multiple valid interpretations
Context-dependent decision

FALSE_POSITIVE - Checker was wrong

Re-validation clearly disproves issue
Content is actually compliant
Checker's detection logic flawed
Report to improve checker

Criticality × Confidence Priority Matrix

Decision Matrix

Criticality	HIGH Confidence	MEDIUM Confidence	FALSE_POSITIVE
🔴 CRITICAL	P0 - Auto-fix immediately (block deployment)	P1 - URGENT manual review	Report with CRITICAL context (fix urgently)
🟠 HIGH	P1 - Auto-fix after P0	P2 - Standard manual review	Report with HIGH context (fix soon)
🟡 MEDIUM	P2 - Auto-fix after P1 (user approval)	P3 - Optional review	Report with MEDIUM context (note for improvement)
🟢 LOW	P3 - Batch fixes (user decides when)	P4 - Suggestions only	Report with LOW context (informational)

Priority Levels Explained

P0 (Blocker): MUST fix before any publication/deployment
P1 (Urgent): SHOULD fix before publication, can proceed with approval
P2 (Normal): Fix in current cycle when convenient
P3 (Low): Fix in future cycle or batch operation
P4 (Optional): Suggestion only, no action required

Execution Order for Fixers

Fixer agents MUST process findings in strict priority order:

1. P0 fixes (CRITICAL + HIGH) → Auto-fix, block if fails
2. P1 fixes (HIGH + HIGH OR CRITICAL + MEDIUM) → Auto-fix HIGH+HIGH, flag CRITICAL+MEDIUM
3. P2 fixes (MEDIUM + HIGH OR HIGH + MEDIUM) → Auto-fix MEDIUM+HIGH if approved, flag HIGH+MEDIUM
4. P3-P4 fixes (LOW priority) → Include in summary only

Checker Agent Responsibilities

Categorizing Findings by Criticality

Decision tree:

1. Does it BREAK functionality or BLOCK users?
   YES → CRITICAL
   NO → Continue

2. Does it cause SIGNIFICANT quality degradation or violate DOCUMENTED conventions?
   YES → HIGH
   NO → Continue

3. Is it a MINOR quality issue or style inconsistency?
   YES → MEDIUM
   NO → Continue

4. Is it a suggestion, optimization, or future consideration?
   YES → LOW

Standardized Report Format

Report header:

markdown

# [Agent Name] Audit Report

**Audit ID**: {uuid-chain}\_\_{timestamp}
**Scope**: {scope-description}
**Files Checked**: N files
**Audit Start**: YYYY-MM-DDTHH:MM:SS+07:00
**Audit End**: YYYY-MM-DDTHH:MM:SS+07:00

---

## Executive Summary

- 🔴 **CRITICAL Issues**: X (must fix before publication)
- 🟠 **HIGH Issues**: Y (should fix before publication)
- 🟡 **MEDIUM Issues**: Z (improve when time permits)
- 🟢 **LOW Issues**: W (optional enhancements)

**Total Issues**: X + Y + Z + W = TOTAL

**Overall Status**: [PASS | PASS WITH WARNINGS | FAIL]

---

Issue sections:

markdown

## 🔴 CRITICAL Issues (Must Fix)

**Count**: X issues found

---

### 1. [Issue Title]

**File**: `path/to/file.md:line`
**Criticality**: CRITICAL - [Why critical]
**Category**: [Category name]

**Finding**: [What's wrong]
**Impact**: [What breaks if not fixed]
**Recommendation**: [How to fix]

**Example**:

```yaml
# Current (broken)
[show broken state]

# Expected (fixed)
[show fixed state]
```

Confidence: [Will be assessed by fixer]


### Dual-Label Pattern

**Five agents require BOTH verification/status AND criticality**:

- `docs__checker` - [Verified]/[Error]/[Outdated]/[Unverified] + criticality
- `docs__tutorial-checker` - Verification labels + criticality
- `apps__ayokoding-web__facts-checker` - Verification labels + criticality
- `docs-link-general-checker` - [OK]/[BROKEN]/[REDIRECT] + criticality
- `apps__ayokoding-web__link-checker` - Status labels + criticality

**Format**:

```markdown
### 1. [Verification] - Issue Title

**File**: `path/to/file.md:line`
**Verification**: [Error] - [Reason for verification status]
**Criticality**: CRITICAL - [Reason for criticality level]
**Category**: [Category name]

**Finding**: [Description]
**Impact**: [Consequences]
**Recommendation**: [Fix]
**Verification Source**: [URL]

**Confidence**: [Will be assessed by fixer]

Why dual labels?

Verification describes FACTUAL STATE ([Verified], [Error], etc.)
Criticality describes URGENCY/IMPORTANCE (CRITICAL, HIGH, etc.)
Both provide complementary information

Fixer Agent Responsibilities

Re-Validation Process

CRITICAL: Fixer agents MUST re-validate all findings before applying fixes.

Never trust checker findings blindly.

Process:

Checker Report → Read Finding → Re-execute Validation → Assess Confidence → Apply/Skip/Report

Re-validation methods:

Extract frontmatter using same AWK pattern
Check file existence for broken links
Count objective metrics (lines, headings)
Verify patterns (date format, naming)
Analyze context (content type, directory)

Confidence Assessment

Step 1: Classify issue type

Objective (missing field, wrong format) → Potentially HIGH confidence
Subjective (narrative flow, tone) → MEDIUM confidence

Step 2: Re-validate finding

Confirms issue → Continue to Step 3
Disproves issue → FALSE_POSITIVE

Step 3: Assess fix safety

Safe and unambiguous → HIGH confidence (auto-fix)
Unsafe or ambiguous → MEDIUM confidence (manual review)

Priority-Based Execution

P0 fixes first (CRITICAL + HIGH confidence):

python

for finding in critical_high_confidence:
    apply_fix(finding)  # Auto-fix immediately
    if fix_failed:
        block_deployment()  # Stop if P0 fails

P1 fixes second:

python

# AUTO: HIGH criticality + HIGH confidence
for finding in high_high_confidence:
    apply_fix(finding)

# FLAG: CRITICAL + MEDIUM confidence (urgent review)
for finding in critical_medium_confidence:
    flag_for_urgent_review(finding)

P2 fixes third:

python

# AUTO if approved: MEDIUM + HIGH
if user_approved_batch_mode:
    for finding in medium_high_confidence:
        apply_fix(finding)

# FLAG: HIGH + MEDIUM (standard review)
for finding in high_medium_confidence:
    flag_for_standard_review(finding)

P3-P4 last:

python

# Include in summary only
for finding in low_priority:
    include_in_summary(finding)

Fix Report Format

markdown

# [Agent Name] Fix Report

**Source Audit**: {agent-family}**{uuid}**{timestamp}\_\_audit.md
**Fix Date**: YYYY-MM-DDTHH:MM:SS+07:00

---

## Execution Summary

- **P0 Fixes Applied**: X (CRITICAL + HIGH confidence)
- **P1 Fixes Applied**: Y (HIGH + HIGH confidence)
- **P1 Flagged for Urgent Review**: Z (CRITICAL + MEDIUM confidence)
- **P2 Fixes Applied**: W (MEDIUM + HIGH confidence)
- **P2 Flagged for Standard Review**: V (HIGH + MEDIUM confidence)
- **P3-P4 Suggestions**: U (LOW priority)
- **False Positives Detected**: T

---

## P0 Fixes Applied (CRITICAL + HIGH Confidence)

### 1. [Issue Title]

**File**: `path/to/file.md`
**Criticality**: CRITICAL - [Why critical]
**Confidence**: HIGH - [Why confident]
**Fix Applied**: [What was changed]

**Before**:

```yaml
[broken state]
```

**After**:

```yaml
[fixed state]
```

[... P1, P2, P3-P4 sections ...]

False Positives Detected

1. [Issue Title]

File: path/to/file.md Checker Finding: [What checker reported] Re-validation: [What fixer found] Conclusion: FALSE_POSITIVE Reason: [Why checker was wrong]

Recommendation for Checker: [How to improve checker logic]


## Domain-Specific Examples

### Repository Rules (wow__rules-checker)

**CRITICAL**:
- Missing `subcategory` field in convention (breaks organization)
- Agent `name` doesn't match filename (discovery fails)
- YAML comment in agent frontmatter (parsing error)

**HIGH**:
- Missing "Principles Respected" section (traceability violation)
- Wrong file naming prefix (convention violation)

**MEDIUM**:
- Missing optional cross-reference
- Suboptimal section ordering

**LOW**:
- Suggest adding related links
- Consider alternative organization

### Hugo Content (ayokoding-web)

**CRITICAL**:
- Missing required `title` field (build fails)
- Invalid YAML syntax (parsing error)
- Broken internal link without language prefix (404)

**HIGH**:
- Missing `weight` field (navigation undefined)
- Wrong internal link format (relative vs absolute)
- Incorrect heading hierarchy (H3 before H2)

**MEDIUM**:
- Missing optional `description` field
- Suboptimal weight spacing

**LOW**:
- Suggest adding optional tags
- Consider alternative structure

### Documentation (docs__checker)

**CRITICAL**:
- [Error] Command syntax incorrect (verified via WebSearch)
- [BROKEN] Internal link to non-existent file
- Security vulnerability in code example

**HIGH**:
- [Outdated] Major version with breaking changes
- Passive voice in step-by-step instructions
- Wrong heading nesting (H1 → H3)

**MEDIUM**:
- [Unverified] External claim needs verification
- Missing optional code fence language tag

**LOW**:
- Suggest additional examples
- Consider adding diagram

## Common Patterns

### Pattern 1: Checker categorizing finding

```markdown
### 1. Missing Required Frontmatter Field

**File**: `apps/ayokoding-web/content/en/programming/python/_index.md:3`
**Criticality**: CRITICAL - Breaks Hugo build process
**Category**: Missing Required Field

**Finding**: Required `draft` field missing from frontmatter
**Impact**: Hugo build fails with "required field missing" error
**Recommendation**: Add `draft: false` to frontmatter

Pattern 2: Fixer assessing confidence

python

# Read checker finding
finding = "Missing `draft` field"

# Re-validate
frontmatter = extract_frontmatter(file)
draft_exists = "draft" in frontmatter  # Result: False (confirmed)

# Assess confidence
issue_type = "objective"  # Field either exists or doesn't
re_validation = "confirmed"  # Field is indeed missing
fix_safety = "safe"  # Adding missing field is straightforward

confidence = "HIGH"  # Objective, confirmed, safe → HIGH
priority = determine_priority("CRITICAL", "HIGH")  # → P0

# Apply fix
apply_fix(finding)

Pattern 3: Dual-label finding

markdown

### 1. [Error] - Command Syntax Incorrect

**File**: `docs/tutorials/tu__quick-start.md:42`
**Verification**: [Error] - Command syntax verified incorrect via WebSearch
**Criticality**: CRITICAL - Breaks user quick start experience
**Category**: Factual Error - Command Syntax

**Finding**: Installation command uses incorrect npm flag `--save-deps`
**Impact**: Users get command error, cannot complete setup
**Recommendation**: Change to `--save-dev`
**Verification Source**: https://docs.npmjs.com/cli/v9/commands/npm-install

**Confidence**: HIGH (verified via official docs)

Best Practices

For Checker Agents

DO:

Use decision tree for consistent criticality assessment
Document specific impact for each finding
Provide clear, actionable recommendations
Include examples showing broken vs fixed state
Write findings progressively during execution

DON'T:

Mix criticality levels in same report section
Skip impact description
Provide vague recommendations
Forget to document verification source (for dual-label)

For Fixer Agents

DO:

ALWAYS re-validate before applying fixes
Process findings in strict priority order (P0 → P1 → P2 → P3)
Document confidence assessment reasoning
Report false positives with improvement suggestions
Group fixes by priority in report

DON'T:

Trust checker findings without re-validation
Apply fixes in discovery order (ignore priority)
Skip MEDIUM confidence manual review flagging
Apply P2 fixes without user approval

Common Mistakes

❌ Mistake 1: Conflating verification with criticality

Wrong: [Error] is CRITICAL, [Verified] is LOW

Right: Verification describes WHAT (factual state), criticality describes HOW URGENT

❌ Mistake 2: File-level confidence instead of per-finding

Wrong: Overall file confidence HIGH

Right: Each finding assessed independently

❌ Mistake 3: Skipping re-validation

Wrong: Trust checker, apply fix directly

Right: Re-validate finding first, then assess confidence

❌ Mistake 4: Ignoring priority order

Wrong: Fix findings in discovery order

Right: Fix P0 first, then P1, then P2, then P3-P4

Creating Domain-Specific Confidence Examples

Fixer agents should include domain-specific examples of HIGH/MEDIUM/FALSE_POSITIVE confidence assessments to guide re-validation decisions.

Purpose of Domain Examples

Why include domain-specific examples?

Provide concrete guidance for re-validation decisions
Clarify what constitutes HIGH vs MEDIUM confidence in specific domain
Help fixer agents make consistent confidence assessments
Reduce ambiguity in edge cases
Document domain conventions and patterns

Where to include: In fixer agent files (not in this Skill - keep examples domain-specific)

Example Structure Template

markdown

### Domain-Specific Confidence Examples

**HIGH Confidence** (Apply automatically):

- [Objective error type 1] verified by [verification method]
- [Objective error type 2] verified by [verification method]
- [Pattern-based error] verified by [pattern check]
- [File-based error] verified by [file check]

**MEDIUM Confidence** (Manual review):

- [Subjective issue 1] that may be [context-dependent reason]
- [Ambiguous issue] where [ambiguity explanation]
- [Quality judgment] requiring [human judgment reason]
- [Context-dependent issue] that could be [valid reason]

**FALSE_POSITIVE** (Report to checker):

- Checker flagged [correct thing] as incorrect ([reason for false positive])
- Checker reported [missing thing] that actually exists ([reason])
- Checker [misunderstood] [context explanation]

Domain Examples by Agent Family

docs-fixer (Factual accuracy domain):

markdown

**HIGH Confidence**:

- Broken command syntax verified by checker's cited sources
- Incorrect version number verified by checker's registry findings
- Wrong API method verified by checker's documentation review
- Broken internal link verified by file existence check

**MEDIUM Confidence**:

- Contradiction that may be context-dependent
- Outdated information where "outdated" is subjective
- Content duplication where duplication may be intentional

**FALSE_POSITIVE**:

- Checker flagged correct LaTeX as incorrect
- Checker flagged valid command as broken

readme-fixer (README quality domain):

markdown

**HIGH Confidence**:

- Paragraph exceeding 5 lines (count is objective)
- Specific jargon patterns without context
- Acronym without expansion
- Passive voice patterns (pattern match)

**MEDIUM Confidence**:

- Overall tone assessment (subjective)
- Engagement quality (context-dependent)
- Sentence length appropriateness (judgment call)

**FALSE_POSITIVE**:

- Checker flagged technical term as jargon (domain-appropriate)
- Checker flagged intentional passive voice (style choice)

docs-tutorial-fixer (Tutorial quality domain):

markdown

**HIGH Confidence**:

- Missing hands-on element verified by structure check
- Wrong tutorial type verified by content analysis
- Missing visual aid in complex section (objective criteria met)

**MEDIUM Confidence**:

- Narrative flow issues (subjective assessment)
- Pedagogical effectiveness (requires teaching expertise)
- Example clarity (reader-dependent)

**FALSE_POSITIVE**:

- Checker flagged advanced tutorial as beginner (correct level)
- Checker reported missing visual where text is sufficient

Guidelines for Creating Examples

Be Specific:

Use concrete error types from your domain
Reference actual verification methods
Include pattern examples
Show real scenarios

Cover Common Cases:

Include the 3-5 most common HIGH confidence scenarios
Include 2-3 common MEDIUM confidence scenarios
Include 2-3 common FALSE_POSITIVE scenarios
Don't try to be exhaustive (guidelines, not rules)

Keep It Actionable:

Focus on verification methods: "verified by [method]"
Explain ambiguity: "where [reason for uncertainty]"
Show reasoning: "that may be [valid reason]"

Domain-Appropriate:

docs-fixer: Command syntax, versions, APIs, links
readme-fixer: Jargon, paragraphs, tone, engagement
tutorial-fixer: Hands-on elements, flow, visuals
link-fixer: Path format, target existence, redirects
structure-fixer: Folder patterns, weights, organization

Placement in Agent Files

Add "Domain-Specific Confidence Examples" section:

After confidence level definitions
Before re-validation guidelines
In fixer agent files (not checker files)

Benefits

✅ Reduces ambiguity in confidence assessment ✅ Provides concrete guidance for edge cases ✅ Improves consistency across similar fixers ✅ Documents domain conventions ✅ Helps new fixer implementations

Anti-Patterns

❌ Too Generic: Examples that could apply to any domain ❌ Too Exhaustive: Trying to cover every possible scenario ❌ No Verification Method: Not explaining how to verify ❌ Missing Context: Not explaining why something is MEDIUM ❌ In Skill: Domain examples belong in agents, not Skills

Key Takeaways

Include domain-specific examples in fixer agents
Cover HIGH/MEDIUM/FALSE_POSITIVE confidence cases
Use concrete scenarios from your domain
Explain verification methods
Keep examples actionable and specific
Place in fixer agents, not in this Skill

References

Primary Conventions:

Criticality Levels Convention - Complete criticality system
Fixer Confidence Levels Convention - Complete confidence system

Related Conventions:

Repository Validation Methodology - Standard validation patterns
Maker-Checker-Fixer Pattern - Three-stage workflow

Related Skills:

wow__applying-maker-checker-fixer - Understanding three-stage workflow
docs__validating-factual-accuracy - Verification label system

Related Agents:

All 14 checker agents and 11 fixer agents use this system.

This Skill packages the critical criticality-confidence classification system for maintaining consistent quality validation and automated fixing. For comprehensive details, consult the primary convention documents.

Search AI Tools

Install this agent skill to your Project

SKILL.md

Criticality-Confidence System Skill

Purpose

Core Concepts

Two Orthogonal Dimensions

Four Criticality Levels

Three Confidence Levels

Criticality × Confidence Priority Matrix

Decision Matrix

Priority Levels Explained

Execution Order for Fixers

Checker Agent Responsibilities

Categorizing Findings by Criticality

Standardized Report Format

Fixer Agent Responsibilities

Re-Validation Process

Confidence Assessment

Priority-Based Execution

Fix Report Format

False Positives Detected

1. [Issue Title]

Pattern 2: Fixer assessing confidence

Pattern 3: Dual-label finding

Best Practices

For Checker Agents

For Fixer Agents

Common Mistakes

❌ Mistake 1: Conflating verification with criticality

❌ Mistake 2: File-level confidence instead of per-finding

❌ Mistake 3: Skipping re-validation

❌ Mistake 4: Ignoring priority order

Creating Domain-Specific Confidence Examples

Purpose of Domain Examples

Example Structure Template

Domain Examples by Agent Family

Guidelines for Creating Examples

Placement in Agent Files

Benefits

Anti-Patterns

Key Takeaways

References