Agent skills
confidence_scorer

Agent skill

confidence_scorer

Stars 129

Forks 20

Install this agent skill to your Project

npx add-skill https://github.com/transilienceai/communitytools/tree/main/projects/pentest/.claude/skills/techstack-identification/confidence_scorer

SKILL.md

Confidence Scorer Skill

Overview

Calculates confidence levels for identified technologies based on signal quantity, quality, source diversity, and corroboration patterns.

Metadata

Skill ID: confidence_scorer
Version: 1.0.0
Category: Correlation
Phase: 4 (Correlation)
Agent: correlation_agent
Execution Mode: Parallel with other correlation skills

Purpose

Assigns High, Medium, or Low confidence levels to each identified technology using quantitative scoring algorithms that consider evidence strength, source diversity, and cross-validation patterns.

Input Requirements

Required Inputs

json

{
  "correlated_technologies": [
    {
      "technology": "string",
      "category": "frontend|backend|infrastructure|security|devops|third_party",
      "sources": ["array of source skill names"],
      "source_count": "integer",
      "agreement_score": "float (0-1)",
      "corroborating_signals": [
        {
          "source": "string",
          "signal": "string",
          "strength": "strong|medium|weak"
        }
      ]
    }
  ],
  "conflicts": [
    {
      "technology": "string",
      "conflict_type": "string",
      "severity": "high|medium|low"
    }
  ]
}

Optional Inputs

scoring_weights: Custom weights for scoring criteria (defaults to standard weights)
confidence_thresholds: Custom thresholds for High/Medium/Low (defaults to 0.7/0.4)

Operations

Operation: calculate_confidence

Assigns confidence level (High/Medium/Low) to each technology based on multi-factor scoring.

Input Parameters:

correlated_technologies: Technologies with corroboration data
conflicts: Detected conflicts affecting confidence

Scoring Criteria (weighted):

Source Diversity (30%): Number of independent evidence sources
Signal Strength (30%): Quality of evidence (explicit vs indirect)
Agreement Score (20%): Cross-source consensus level
Evidence Type (15%): Technical signals vs job postings
Conflict Impact (5%): Presence of contradictory signals

Confidence Thresholds:

High Confidence: Score ≥ 0.70
Medium Confidence: Score ≥ 0.40 and < 0.70
Low Confidence: Score < 0.40

Process:

Calculate source diversity score (0-1 based on source count)
Assess signal strength score (0-1 based on evidence quality)
Factor in agreement score from correlation analysis
Weight evidence types (technical > job posting)
Apply conflict penalties if contradictions exist
Compute weighted total (0-1 scale)
Map to confidence level (High/Medium/Low)
Generate explanation for assigned confidence

Output:

json

{
  "technologies_with_confidence": [
    {
      "technology": "React",
      "category": "frontend",
      "version": "18.x",
      "confidence": "High",
      "confidence_score": 0.92,
      "confidence_reasoning": {
        "source_diversity_score": 1.0,
        "source_count": 4,
        "signal_strength_score": 0.95,
        "agreement_score": 0.95,
        "evidence_type_score": 1.0,
        "conflict_penalty": 0.0,
        "weighted_total": 0.92
      },
      "evidence_summary": {
        "technical_signals": 3,
        "job_posting_mentions": 5,
        "strong_evidence_count": 3,
        "medium_evidence_count": 2,
        "weak_evidence_count": 0
      }
    },
    {
      "technology": "PostgreSQL",
      "category": "backend",
      "confidence": "Medium",
      "confidence_score": 0.55,
      "confidence_reasoning": {
        "source_diversity_score": 0.33,
        "source_count": 1,
        "signal_strength_score": 0.60,
        "agreement_score": 0.50,
        "evidence_type_score": 0.60,
        "conflict_penalty": 0.0,
        "weighted_total": 0.55
      },
      "evidence_summary": {
        "technical_signals": 0,
        "job_posting_mentions": 8,
        "strong_evidence_count": 0,
        "medium_evidence_count": 1,
        "weak_evidence_count": 0
      },
      "confidence_limitations": [
        "Single source only (job_posting_analysis)",
        "No technical signal corroboration",
        "Recommend verification via database connection fingerprinting"
      ]
    }
  ]
}

Operation: generate_confidence_summary

Creates aggregate confidence statistics for the entire report.

Input Parameters:

technologies_with_confidence: All scored technologies

Process:

Count by confidence level (High/Medium/Low)
Calculate overall score (weighted average)
Identify confidence gaps (technologies needing more evidence)
Generate quality metrics (% high confidence, etc.)
Provide recommendations for improving confidence

Output:

json

{
  "confidence_summary": {
    "high_confidence_count": 45,
    "medium_confidence_count": 23,
    "low_confidence_count": 8,
    "total_technologies": 76,
    "overall_confidence_score": 0.78,
    "high_confidence_percentage": 59.2,
    "quality_rating": "Good",
    "by_category": {
      "frontend": {
        "high": 12,
        "medium": 5,
        "low": 1,
        "avg_score": 0.82
      },
      "backend": {
        "high": 8,
        "medium": 9,
        "low": 3,
        "avg_score": 0.65
      },
      "infrastructure": {
        "high": 15,
        "medium": 3,
        "low": 2,
        "avg_score": 0.85
      },
      "security": {
        "high": 5,
        "medium": 2,
        "low": 1,
        "avg_score": 0.73
      },
      "devops": {
        "high": 3,
        "medium": 2,
        "low": 1,
        "avg_score": 0.68
      },
      "third_party": {
        "high": 2,
        "medium": 2,
        "low": 0,
        "avg_score": 0.75
      }
    },
    "recommendations": [
      "23 technologies at medium confidence could be upgraded with additional technical signals",
      "8 low confidence findings should be manually verified or removed"
    ]
  }
}

Operation: identify_confidence_gaps

Highlights technologies that need additional evidence to improve confidence.

Input Parameters:

technologies_with_confidence: Scored technologies

Process:

Identify medium/low confidence technologies
Analyze missing evidence types (what would help)
Suggest additional skills to run for validation
Prioritize by importance (critical tech vs nice-to-have)
Generate actionable recommendations

Output:

json

{
  "confidence_gaps": [
    {
      "technology": "Redis",
      "current_confidence": "Medium",
      "current_score": 0.58,
      "missing_evidence_types": [
        "technical_signal",
        "dns_records",
        "http_headers"
      ],
      "recommendations": [
        "Check for redis-cli banner via network probing (if authorized)",
        "Search for Redis-specific error messages in HTTP responses",
        "Look for redis:// connection strings in public repositories"
      ],
      "potential_score_improvement": 0.25,
      "priority": "medium"
    }
  ]
}

Scoring Algorithm Details

Source Diversity Score

score = min(source_count / 4, 1.0)

Rationale:
- 1 source = 0.25
- 2 sources = 0.50
- 3 sources = 0.75
- 4+ sources = 1.00

Signal Strength Score

strong_signals_weight = 1.0
medium_signals_weight = 0.6
weak_signals_weight = 0.3

score = (strong_count * 1.0 + medium_count * 0.6 + weak_count * 0.3) / total_signals

Evidence Type Score

technical_signal_weight = 1.0
job_posting_weight = 0.5
archive_data_weight = 0.4

score = (technical_count * 1.0 + job_count * 0.5 + archive_count * 0.4) / total_evidence

Conflict Penalty

high_severity_conflict = -0.30
medium_severity_conflict = -0.15
low_severity_conflict = -0.05

Applied to final score if conflicts exist for this technology

Overall Confidence Score

confidence_score = (
  source_diversity_score * 0.30 +
  signal_strength_score * 0.30 +
  agreement_score * 0.20 +
  evidence_type_score * 0.15 +
  (1.0 - conflict_penalty) * 0.05
)

Confidence Level Mapping

High Confidence (≥ 0.70)

Criteria (any of):

3+ independent evidence sources with strong signals
Explicit identifier found (meta tag, header, global variable)
Job posting mentions + 2+ technical signals
Direct API/service detection with verification

Interpretation: Very likely to be accurate

Medium Confidence (0.40 - 0.69)

Criteria (typical):

1-2 evidence sources with medium/strong signals
Indirect indicators (URL patterns, cookies, DOM attributes)
Job posting mentions without technical corroboration
Single strong technical signal without cross-validation

Interpretation: Plausible but requires additional verification

Low Confidence (< 0.40)

Criteria (typical):

Single weak signal only
Speculation based on generic behavior
Conflicting evidence without clear resolution
Outdated information without current confirmation

Interpretation: Speculative hypothesis requiring manual validation

Output Format

Success Output

json

{
  "status": "success",
  "technologies_with_confidence": [...],
  "confidence_summary": {...},
  "confidence_gaps": [...],
  "statistics": {
    "total_technologies_scored": 76,
    "high_confidence": 45,
    "medium_confidence": 23,
    "low_confidence": 8,
    "average_confidence_score": 0.78
  },
  "execution_time_ms": 850
}

Error Output

json

{
  "status": "error",
  "error_code": "NO_TECHNOLOGIES",
  "error_message": "No technologies provided for confidence scoring",
  "partial_results": null
}

Error Handling

No Technologies

IF technologies array empty → Return error, cannot score
IF all technologies lack evidence → Assign all Low confidence

Missing Correlation Data

IF corroboration data missing → Use evidence count only
IF agreement scores unavailable → Default to 0.5
IF conflicts data missing → Skip conflict penalties

Invalid Input

IF technology missing required fields → Skip that technology, continue
IF confidence threshold invalid → Use defaults (0.7/0.4)
IF weights don't sum to 1.0 → Normalize weights

Dependencies

Required Skills

signal_correlator (provides corroboration data)

Required Libraries

Standard mathematical functions (weighted averages)
JSON processing

External APIs

None (pure computation)

Configuration

Settings (from settings.json)

json

{
  "confidence_scoring": {
    "high_confidence_threshold": 0.70,
    "medium_confidence_threshold": 0.40,
    "source_diversity_weight": 0.30,
    "signal_strength_weight": 0.30,
    "agreement_weight": 0.20,
    "evidence_type_weight": 0.15,
    "conflict_weight": 0.05,
    "min_sources_for_high": 3
  }
}

Usage Example

json

{
  "operation": "calculate_confidence",
  "inputs": {
    "correlated_technologies": [ /* from signal_correlator */ ],
    "conflicts": [ /* from signal_correlator */ ]
  }
}

Best Practices

Be conservative - When uncertain, assign lower confidence
Document reasoning - Always explain confidence score
Highlight limitations - Call out single-source dependencies
Provide recommendations - Suggest how to improve confidence
Recalculate after edits - Update scores when evidence added

Calibration Guidelines

Calibrate Against Known Examples

Test on known tech stacks to validate scoring accuracy
Adjust weights if consistently over/under-confident
Review false positives and adjust signal strength weights
Validate thresholds (0.7/0.4) against real-world accuracy

Quality Assurance

Target distribution: 50-70% High, 20-35% Medium, <15% Low
IF too many Low: Loosen thresholds or improve collection
IF too many High: Tighten thresholds or strengthen criteria
Monitor accuracy: Track user feedback on confidence levels

Limitations

Confidence ≠ certainty (probabilistic estimates only)
Context-dependent (job posting weight varies by company size)
Temporal sensitivity (recent evidence > old evidence)
Cannot detect all false positives (some will have high confidence)

Security Considerations

No external requests (pure computation)
Sanitize output (remove sensitive paths/URLs)
Log scoring decisions for audit trail
Preserve evidence for forensic review

Version History

1.0.0 (2024-01-20): Initial implementation with multi-factor scoring

Maintainer

transilienceai Core maintainer

Source details

Full Name: transilienceai/communitytools
Branch: main
Path in repo: projects/pentest/.claude/skills/techstack-identification/confidence_scorer
License: MIT License
Topics: security security-tools bounty-hunters hackerone pentest pentest-tool security-research

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

transilienceai/communitytools

techstack-identification

OSINT-based technology stack identification. Discovers company tech stacks using passive reconnaissance across 17 intelligence domains. Given a company name (and optional domain hint), infers frontend, backend, infrastructure, and security technologies using publicly available signals.

129 20

Explore

transilienceai/communitytools