Agent skill
performing-ai-driven-osint-correlation
Use AI and LLM-based reasoning to correlate findings across multiple OSINT sources—username enumeration, email lookups, social media profiles, domain records, breach databases, and dark-web mentions—into unified intelligence profiles with confidence scoring and link analysis.
Install this agent skill to your Project
npx add-skill https://github.com/mukul975/Anthropic-Cybersecurity-Skills/tree/main/skills/performing-ai-driven-osint-correlation
SKILL.md
Performing AI-Driven OSINT Correlation
When to Use
- You have collected raw OSINT data from multiple tools and sources but need to identify connections, contradictions, and patterns across them.
- You need to build a unified intelligence profile for a target entity (person, organization, or infrastructure) from fragmented data.
- Traditional manual correlation is too slow or error-prone for the volume of data collected.
- You want confidence-scored assessments of identity linkage across platforms rather than simple keyword matching.
Prerequisites
- Python 3.10+ with
requests,json, andcsvlibraries - Sherlock installed (
pip install sherlock-project) - theHarvester installed (
pip install theHarvester) - SpiderFoot 4.0+ running on localhost:5001
- Access to an LLM API (OpenAI, Anthropic, or local model via Ollama)
- Optional: Maltego CE for graph visualization of correlation results
- Optional: API keys for Shodan, VirusTotal, HaveIBeenPwned, Hunter.io
Workflow
Legal & Ethical Requirements
- Obtain documented written authorization before any investigation
- Establish lawful basis for data processing (law enforcement, corporate policy, etc.)
- Define PII retention limits and data handling procedures
- Comply with local privacy regulations (GDPR, CCPA, etc.)
Phase 1 — Multi-Source OSINT Collection
-
Create the working directory for all OSINT outputs:
bashmkdir -p /tmp/osint -
Enumerate usernames across platforms with Sherlock:
bashsherlock "targetusername" --output /tmp/osint/sherlock-results.txt --csv -
Harvest emails, subdomains, and hosts with theHarvester:
bashtheHarvester -d targetdomain.com -b all -f /tmp/osint/harvester-results.json -
Run a SpiderFoot passive scan via REST API:
bashcurl -s http://localhost:5001/api/scan/start \ -d "scanname=target-recon&scantarget=targetdomain.com&usecase=passive" \ | jq '.scanid' -
Export SpiderFoot results when scan completes:
bashSCAN_ID="<scanid_from_step_3>" curl -s "http://localhost:5001/api/scan/${SCAN_ID}/results?type=all" \ -o /tmp/osint/spiderfoot-results.json -
Query breach databases for email exposure (example with HIBP API):
bashcurl -s -H "hibp-api-key: ${HIBP_KEY}" \ -H "User-Agent: OSINT-Correlation-Skill" \ "https://haveibeenpwned.com/api/v3/breachedaccount/target@example.com" \ -o /tmp/osint/breach-results.json
Phase 2 — Data Normalization
-
Normalize all collected data into a common schema. Create a unified JSON structure that tags each finding with its source, timestamp, and data type:
bashcat > /tmp/osint/normalize.py << 'EOF' import json, csv, sys, os from datetime import datetime findings = [] # Normalize Sherlock CSV results sherlock_path = "/tmp/osint/sherlock-results.txt" if os.path.exists(sherlock_path): with open(sherlock_path) as f: for row in csv.DictReader(f): findings.append({ "source": "sherlock", "type": "social_profile", "platform": row.get("name", ""), "url": row.get("url_user", ""), "username": row.get("username", ""), "status": row.get("status", ""), "collected_at": datetime.utcnow().isoformat() }) # Normalize theHarvester JSON results harvester_path = "/tmp/osint/harvester-results.json" if os.path.exists(harvester_path): with open(harvester_path) as f: data = json.load(f) for email in data.get("emails", []): findings.append({ "source": "theHarvester", "type": "email", "value": email, "collected_at": datetime.utcnow().isoformat() }) for host in data.get("hosts", []): findings.append({ "source": "theHarvester", "type": "hostname", "value": host, "collected_at": datetime.utcnow().isoformat() }) # Normalize SpiderFoot results sf_path = "/tmp/osint/spiderfoot-results.json" if os.path.exists(sf_path): with open(sf_path) as f: for item in json.load(f): findings.append({ "source": "spiderfoot", "type": item.get("type", "unknown"), "value": item.get("data", ""), "module": item.get("module", ""), "collected_at": datetime.utcnow().isoformat() }) with open("/tmp/osint/normalized-findings.json", "w") as f: json.dump(findings, f, indent=2) print(f"Normalized {len(findings)} findings from {len(set(f['source'] for f in findings))} sources") EOF python3 /tmp/osint/normalize.py
Phase 3 — AI-Driven Correlation
-
Send normalized findings to an LLM for cross-source correlation analysis:
bashcat > /tmp/osint/correlate.py << 'PYEOF' import json, os from openai import OpenAI # or anthropic, ollama, etc. client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) with open("/tmp/osint/normalized-findings.json") as f: findings = json.load(f) correlation_prompt = f"""You are an OSINT analyst. Analyze these findings collected from multiple sources and produce a correlation report. For each identity or entity you detect: 1. List all linked accounts/profiles with the evidence connecting them. 2. Assign a confidence score (0.0-1.0) for each linkage based on: - Exact username match across platforms (high) - Similar usernames with shared metadata (medium) - Same email in breach data and registration (high) - Co-occurring infrastructure (IP, domain) (medium) - Temporal correlation of account creation dates (low-medium) 3. Identify contradictions or potential false positives. 4. Flag high-risk exposures (breached credentials, PII leaks, infrastructure overlaps). 5. Produce a structured JSON report. Raw findings: {json.dumps(findings[:500], indent=2)} """ response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "You are an expert OSINT analyst specializing in identity correlation and link analysis."}, {"role": "user", "content": correlation_prompt} ], temperature=0.1, response_format={"type": "json_object"} ) report = json.loads(response.choices[0].message.content) with open("/tmp/osint/correlation-report.json", "w") as f: json.dump(report, f, indent=2) print(json.dumps(report, indent=2)) PYEOF python3 /tmp/osint/correlate.py -
Perform entity resolution — deduplicate and merge related identities:
bashcat > /tmp/osint/resolve.py << 'PYEOF' import json with open("/tmp/osint/correlation-report.json") as f: report = json.load(f) # Extract entities and build a link graph entities = report.get("entities", []) print(f"Identified {len(entities)} distinct entities") for entity in entities: name = entity.get("identifier", "unknown") confidence = entity.get("confidence", 0) links = entity.get("linked_accounts", []) risk = entity.get("risk_level", "unknown") print(f" [{confidence:.0%}] {name} — {len(links)} linked accounts — risk: {risk}") PYEOF python3 /tmp/osint/resolve.py
Phase 4 — Reporting and Visualization
-
Generate a final intelligence profile in Markdown:
bashcat > /tmp/osint/report.py << 'PYEOF' import json from datetime import datetime with open("/tmp/osint/correlation-report.json") as f: report = json.load(f) md = f"# OSINT Correlation Report\n\n" md += f"**Generated:** {datetime.utcnow().isoformat()}Z\n\n" md += "## Entity Profiles\n\n" for entity in report.get("entities", []): eid = entity.get("identifier", "Unknown") conf = entity.get("confidence", 0) md += f"### {eid} (Confidence: {conf:.0%})\n\n" md += "| Source | Platform | Evidence |\n|--------|----------|----------|\n" for link in entity.get("linked_accounts", []): md += f"| {link.get('source','')} | {link.get('platform','')} | {link.get('evidence','')} |\n" md += f"\n**Risk Level:** {entity.get('risk_level', 'N/A')}\n\n" for flag in entity.get("flags", []): md += f"- ⚠️ {flag}\n" md += "\n" with open("/tmp/osint/intelligence-profile.md", "w") as f: f.write(md) print("Report written to /tmp/osint/intelligence-profile.md") PYEOF python3 /tmp/osint/report.py -
Optional — Import correlation graph into Maltego for visualization:
bash# Export entities as Maltego-compatible CSV for manual import cat > /tmp/osint/maltego_export.py << 'PYEOF' import json, csv with open("/tmp/osint/correlation-report.json") as f: report = json.load(f) with open("/tmp/osint/maltego-import.csv", "w", newline="") as f: writer = csv.writer(f) writer.writerow(["Entity Type", "Value", "Linked To", "Link Label", "Confidence"]) for entity in report.get("entities", []): for link in entity.get("linked_accounts", []): writer.writerow([ link.get("type", "Alias"), link.get("value", ""), entity.get("identifier", ""), link.get("evidence", ""), link.get("confidence", "") ]) print("Maltego CSV exported to /tmp/osint/maltego-import.csv") PYEOF python3 /tmp/osint/maltego_export.py
Key Concepts
| Concept | Description |
|---|---|
| Cross-Source Correlation | Matching identifiers (usernames, emails, IPs) across independent OSINT sources to establish entity linkage |
| Confidence Scoring | Assigning probabilistic confidence (0.0–1.0) to each linkage based on evidence strength and corroboration |
| Entity Resolution | Deduplicating and merging records that refer to the same real-world entity across fragmented datasets |
| False Positive Detection | Using AI reasoning to identify coincidental matches versus genuine identity links |
| Multi-Vector Intelligence | Combining findings from social media, DNS, breach data, and infrastructure into a single threat picture |
| Link Analysis | Graph-based examination of relationships between entities, accounts, and infrastructure |
Tools & Systems
| Tool | Role in Workflow |
|---|---|
| Sherlock | Username enumeration across 400+ social platforms |
| theHarvester | Email, subdomain, and host discovery from public sources |
| SpiderFoot | Automated OSINT collection across 200+ modules |
| Maltego | Graph-based visualization of entity relationships |
| LLM API (GPT-4, Claude, Ollama) | Cross-source reasoning, pattern detection, and confidence scoring |
| HaveIBeenPwned | Breach exposure and credential leak detection |
Common Scenarios
- Threat Actor Attribution: Correlate a suspicious username found in a phishing campaign with social media profiles, domain registrations, and breach data to build an attribution profile.
- Attack Surface Mapping: Link discovered subdomains, emails, and employee social accounts to understand an organization's full external exposure.
- Insider Threat Investigation: Cross-reference an employee's known accounts with dark web marketplace activity and breach databases.
- Brand Impersonation Detection: Identify accounts across platforms mimicking a target brand by correlating registration patterns, naming conventions, and temporal signals.
Output Format
The final output is a structured JSON correlation report and a Markdown intelligence profile containing:
{
"meta": {
"target": "targetdomain.com",
"sources_used": ["sherlock", "theHarvester", "spiderfoot", "hibp"],
"total_findings": 247,
"generated_at": "2025-01-15T14:30:00Z"
},
"entities": [
{
"identifier": "john.target",
"confidence": 0.92,
"linked_accounts": [
{
"source": "sherlock",
"platform": "GitHub",
"value": "john.target",
"evidence": "Exact username match, bio references targetdomain.com",
"confidence": 0.95
}
],
"risk_level": "high",
"flags": [
"Credentials exposed in 2 breaches (2022, 2023)",
"Admin email for targetdomain.com found in public WHOIS"
]
}
],
"contradictions": [],
"recommendations": []
}
Verification
- Confirm that each linked account has been independently verified against at least two sources before assigning confidence > 0.8.
- Cross-check AI-generated correlations manually for a random sample (10–20%) to validate accuracy.
- Verify that no false positives from common usernames (e.g., "admin", "test") inflated entity profiles.
- Ensure breach data timestamps are current and from reputable aggregators.
- Validate that the final report does not include stale or retracted OSINT data.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
mapping-mitre-attack-techniques
Maps observed adversary behaviors, security alerts, and detection rules to MITRE ATT&CK techniques and sub-techniques to quantify detection coverage and guide control prioritization. Use when building an ATT&CK-based coverage heatmap, tagging SIEM alerts with technique IDs, aligning security controls to adversary playbooks, or reporting threat exposure to executives. Activates for requests involving ATT&CK Navigator, Sigma rules, MITRE D3FEND, or coverage gap analysis.
hunting-for-spearphishing-indicators
Hunt for spearphishing campaign indicators across email logs, endpoint telemetry, and network data to detect targeted email attacks.
analyzing-malicious-url-with-urlscan
URLScan.io is a free service for scanning and analyzing suspicious URLs. It captures screenshots, DOM content, HTTP transactions, JavaScript behavior, and network connections of web pages in an isolat
implementing-zero-standing-privilege-with-cyberark
Deploy CyberArk Secure Cloud Access to eliminate standing privileges in hybrid and multi-cloud environments using just-in-time access with time, entitlement, and approval controls.
implementing-pam-for-database-access
Deploy privileged access management for database systems including Oracle, SQL Server, PostgreSQL, and MySQL. Covers session proxy configuration, credential vaulting, query auditing, dynamic credentia
detecting-t1003-credential-dumping-with-edr
Detect OS credential dumping techniques targeting LSASS memory, SAM database, NTDS.dit, and cached credentials using EDR telemetry, Sysmon process access monitoring, and Windows security event correlation.
Didn't find tool you were looking for?