Agent skill
pwc-audit-intelligence
Expert in audit ground truth extraction, Critical Audit Matters (CAMs), ICFR validation, PCAOB independence compliance, and SEC filing analysis for PWC audit staff
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/pwc-audit-intelligence-nirvanchitnis-cmyk-ground-truth
SKILL.md
PWC Audit Intelligence Skill
Purpose
This skill packages audit domain expertise for the ground-truth project, enabling Claude to understand audit concepts, compliance requirements, and quality validation without re-explanation in each session.
When to Use This Skill
- Extracting ground truth data from SEC filings (10-K, 10-Q, DEF 14A)
- Analyzing Critical Audit Matters (CAMs)
- Validating Internal Controls over Financial Reporting (ICFR)
- Ensuring PCAOB independence compliance (Rule 3520)
- Building audit intelligence systems (RAG, ML models, partner dashboards)
Core Audit Concepts
1. Critical Audit Matters (CAMs)
Definition: Matters arising from the current period audit of financial statements that were communicated (or required to be communicated) to the audit committee and that:
- Relate to accounts/disclosures material to the financial statements
- Involved especially challenging, subjective, or complex auditor judgment
Why They Matter:
- High audit risk: Areas requiring significant professional judgment
- Complexity indicator: More CAMs = more complex client
- Partner resource allocation: CAM-heavy clients require senior staff
- Churn risk: Clients with increasing CAM count may be at risk
Examples (from ground-truth extractions):
- PFSI (PennyMac): Mortgage Servicing Rights (MSRs) valuation
- Why critical: "Involved especially challenging, subjective, or complex judgments"
- Sector-specific: Unique to mortgage banking
- ILMN (Illumina): Revenue recognition for complex multi-element arrangements
- Banking sector: Loan loss reserves (CECL calculations)
Typical CAM Counts:
- 0-1 CAMs: Low complexity (straightforward audits)
- 2-3 CAMs: Moderate complexity (industry-standard)
- 4+ CAMs: High complexity (partner-intensive)
Red Flags:
- Increasing CAM count year-over-year
- Same CAM repeated for 3+ years (unresolved issues)
- CAMs related to management estimates (subjectivity risk)
2. Internal Controls over Financial Reporting (ICFR)
Definition (SOX 404): Process designed to provide reasonable assurance regarding reliability of financial reporting and preparation of financial statements.
Effectiveness Conclusion: Binary (Effective / Ineffective)
- Effective: No material weaknesses identified
- Ineffective: One or more material weaknesses exist
Why It Matters:
- Audit quality: Clean ICFR = lower detection risk
- Client health: ICFR failures signal governance issues
- Churn risk: Material weaknesses increase partner workload, may lead to resignation
- Regulatory risk: ICFR failures attract SEC scrutiny
Material Weakness Examples:
- Inadequate segregation of duties
- Ineffective IT general controls (ITGC)
- Lack of evidence for key control execution
- Management override of controls
Ground Truth Extraction (sec_10k.controls):
{
"icfr_effective": true,
"auditor": "Deloitte & Touche LLP",
"opinion_date": "2025-02-19",
"opinion": "In our opinion, the Company maintained, in all material respects, effective internal control over financial reporting..."
}
ICFR Status Distribution (across 13 companies):
- Effective: 13/13 (100%) — all test companies have clean controls
- Ineffective: 0/13 — no material weaknesses found (expected, these are mature public companies)
3. PCAOB Independence Compliance (Rule 3520)
Rule: Auditors must be independent in fact and appearance
Key Prohibitions:
- Cannot audit own firm's clients (independence conflict)
- Cannot perform certain non-audit services (e.g., bookkeeping)
- Cannot have financial interest in audit client
- Rotation requirements (lead partner: 5 years, reviewing partner: 5 years)
Why It Matters for ground-truth:
- Testing restriction: Can ONLY use non-PWC clients for development
- Deployment restriction: Must get Independence Office approval before using PWC client data
- Provenance requirement: Must document that all test data is independence-compliant
Permitted Test Companies (127 total):
- Audited by: EY, Deloitte, KPMG (NOT PWC)
- Primary test company: PFSI (PennyMac Financial Services) — Deloitte client
- Full list:
config/test_companies_permitted.csv
Verification Process:
- Check PCAOB Form AP (auditor registration)
- Cross-reference company CIK with Form AP client list
- If PWC is auditor → FAIL (cannot use)
- If EY/Deloitte/KPMG → PASS (permitted)
Deployment Checklist (before using PWC clients):
- Independence Office consultation
- Legal/Compliance sign-off
- Client consent (where required)
- Bias audit completed
- PCAOB consultation (recommended)
4. SEC Filing Types
10-K (Annual Report):
- Item 1A: Risk Factors
- Item 7: MD&A (Management's Discussion & Analysis)
- Item 9A: Controls and Procedures (ICFR opinion)
- Critical Audit Matters: In auditor's report (near end of 10-K)
10-Q (Quarterly Report):
- Condensed financials (no full CAM disclosure)
- ICFR disclosure only if material change
DEF 14A (Proxy Statement):
- Auditor information (fees, tenure)
- Executive compensation
- Board composition
- Related party transactions
8-K (Current Report):
- Material events (M&A, executive changes, restatements)
Ground Truth Extraction Workflow
Phase 1: Company Resolution
Input: Ticker (e.g., "PFSI") or CIK (e.g., "0001745916")
Process:
- Resolve ticker → CIK via SEC API
- Fetch company profile (includes SIC code, auditor, fiscal year end)
- Check independence: Is company a PWC client? (FAIL if yes)
Output: Company metadata
{
"ticker": "PFSI",
"cik": "0001745916",
"company_name": "PENNYMAC FINANCIAL SERVICES INC",
"sic_code": "6162",
"sector": "mortgage",
"auditor": "Deloitte & Touche LLP"
}
Phase 2: Data Extraction
Sources:
- SEC XBRL: Financial statements (Assets, Liabilities, Equity, Net Income, EPS)
- SEC 10-K: CAMs, ICFR, Risk Factors
- Market data: Current price, market cap
- Sector-specific: HMDA (mortgage), FDIC (banking), EIA (utilities)
Provenance Requirements:
- source_url: Direct link to SEC filing
- file_sha256: Cryptographic hash of source document
- extraction_method: How fact was extracted (XBRL API, regex, table parser)
- confidence: Score 0.0-1.0 (1.0 = deterministic, <1.0 = heuristic)
Phase 3: Validation
Balance Sheet Reconciliation:
Assets = Liabilities + Stockholders' Equity
- PASS: Difference < $1M or < 0.1% of assets
- FAIL: Significant imbalance (check for off-balance-sheet items, extraction errors)
EPS Consistency:
Calculated EPS = Net Income / Shares Outstanding
- PASS: Within $0.01 of reported diluted EPS
- FAIL: Material difference (check for share count errors, preferred dividends)
Data Quality:
- Required facts present (Assets, Liabilities, Equity, Net Income)
- Numeric plausibility (no negative equity for going concerns)
- Date validity (ISO 8601 format)
Provenance Completeness:
- All facts have source_url
- All facts have SHA-256 checksums
- Evidence files archived locally
Phase 4: Sector Routing
SIC Code Classification:
- Banking (6000-6099): FDIC call reports, summary of deposits
- Mortgage (6100-6199): HMDA originations, PMMS rates, FHFA HPI
- Utilities (4900-4999): EIA data, EPA CAMD emissions
- Airlines (4500-4599): BTS Form 41, on-time performance
- Tech/Other: Base extractors only (XBRL, 10-K, market data)
Routing Command:
python -m ground_truth.cli classify PFSI
# Output: Sector: mortgage, Extractors: sec_xbrl, sec_10k, market_data, hmda, pmms, fhfa_hpi
RAG Integration (Phase 2)
Chunking Strategy
10-K Sections to Embed:
- Critical Audit Matters (keep each CAM as separate chunk)
- ICFR controls (single chunk)
- Risk Factors (split if >1000 tokens)
- MD&A (split by subsection)
Chunk Metadata (required):
{
"chunk_id": "PFSI_2024_CAM_01",
"ticker": "PFSI",
"cik": "0001745916",
"section": "Critical Audit Matters",
"filing_date": "2024-12-31",
"source_sha256": "f52e532ba...",
"chunk_index": 0,
"total_chunks": 2
}
Query Patterns
Factual Retrieval:
- "What are PFSI's Critical Audit Matters?"
- "Is PFSI's ICFR effective?"
- "What is EWBC's total assets?"
Comparative:
- "Compare PFSI and RKT risk factors"
- "Which has more CAMs: PFSI or RKT?"
Analytical:
- "Which mortgage companies have ICFR failures?"
- "Which companies have 3+ CAMs?"
- "Show all companies audited by Deloitte"
Temporal (requires multi-year data):
- "Did PFSI's net income increase in 2024?"
- "What CAMs did PFSI have in 2023 vs 2024?"
LTV Prediction Model (Phase 3)
Features (Churn Risk Indicators)
Complexity Indicators:
- CAM count (0-1 = low, 2-3 = medium, 4+ = high)
- ICFR effectiveness (effective = -20 pts, ineffective = +50 pts)
- Restatement history (each restatement = +30 pts)
Financial Health:
- Profitability (net income < 0 = +20 pts)
- Leverage (debt/equity > 3.0 = +15 pts)
- Liquidity (current ratio < 1.0 = +10 pts)
Sector Risk:
- Mortgage (cyclical, sensitive to rates) = +10 pts
- Banking (regulatory-heavy) = +5 pts
- Tech (fast-changing, valuation risk) = +5 pts
Heuristic Score (0-100):
def calculate_client_risk_score(ground_truth):
score = 50 # Base score
# CAM complexity
cam_count = len(ground_truth['critical_audit_matters'])
if cam_count >= 3:
score += 30
elif cam_count == 0:
score -= 30
# ICFR status
if not ground_truth['controls']['icfr_effective']:
score += 50
else:
score -= 20
# Profitability
if ground_truth['xbrl']['NetIncomeLoss'] < 0:
score += 20
return max(0, min(100, score)) # Clamp to 0-100
Risk Bands:
- 0-30: Low risk (retain, minimal partner time)
- 31-60: Medium risk (monitor, standard engagement)
- 61-100: High risk (churn candidate, consider resignation)
Quality Standards
Audit-Grade Provenance
Every fact must include:
- Source URL: Direct link to SEC filing
- SHA-256: Cryptographic hash for verification
- Extraction method: How it was obtained
- Confidence score: Reliability estimate
Example:
{
"matter": "Mortgage Servicing Rights (MSRs)",
"why_critical": "Involved especially challenging, subjective, or complex judgments",
"provenance": {
"source_url": "https://www.sec.gov/Archives/edgar/data/1745916/000155837025001148/pfsi-20241231x10k.htm",
"file_sha256": "f52e532ba113920525cf682c979c52e107904c847d6532ef0d922b71ba684e6a",
"extraction_method": "section_locator",
"confidence": 0.7
}
}
Verification Process
Partners must be able to:
- Download original 10-K from SEC
- Compute SHA-256 hash
- Verify it matches provenance metadata
- Manually locate cited text in filing
If mismatch:
- Flag for investigation (corruption or fabrication)
- Re-run extraction
- Update evidence files
PWC Pitch Framework
Problem
Partners manually review 5,000+ client 10-Ks annually:
- 200 pages per 10-K × 2 hours = 10,000 hours/year
- No institutional memory (knowledge locked in partner minds)
- Reactive (find issues after they occur) vs proactive
Solution
Automated ground truth extraction + RAG query interface:
- Extract CAMs, ICFR, financials in 3 minutes
- Natural language queries: "Which clients have material weaknesses?"
- Full provenance (SHA-256 verification)
- Proactive alerts (CAM count increasing, ICFR failures)
Proof
- 13 companies extracted (100% success rate)
- 43.8 MB evidence archived with SHA-256 checksums
- All 127 test companies are independence-compliant
- RAG POC: Partners query in 2 seconds vs 2 hours
Business Impact
Efficiency:
- 200 hours → 2 hours per partner per year (100x gain)
Risk Detection:
- Automated alerts for high-risk clients
- Early warning for churn candidates
- Proactive partner-client matching
Compliance:
- Audit trail maintained (SHA-256 provenance)
- Independence safeguards built-in
- PCAOB-compliant validation
Ask
6 weeks + AWS/Azure credits for 50-company pilot
Common Failure Modes
Extraction Errors
CAM extraction returns 0 CAMs (but company likely has them):
- Cause: Section locator regex doesn't match 10-K format
- Fix: Debug section headers, test on 3-5 known CAM companies
Balance sheet doesn't balance:
- Cause: XBRL period inconsistency (mixing quarterly/annual)
- Fix: Consistent period selection logic (see PFSI bug fix)
"Revenues" fact missing for banks:
- Cause: Banks use different XBRL tags (InterestAndDividendIncomeOperating)
- Fix: Add sector-specific tag mappings
Validation Failures
EPS consistency check fails:
- Possible causes: Preferred dividends, share count timing, rounding
- Action: Check 10-K footnotes, verify share count source
Data quality FAIL:
- Cause: Required fact missing (Assets, Liabilities, etc.)
- Action: Check XBRL API response, add fallback tag mappings
Independence Violations
Extracted JPM data (PWC client):
- Risk: PCAOB Rule 3520 violation
- Action: Delete immediately, add to blocked list, re-check permitted companies
Reference Files
In ground-truth repo:
config/test_companies_permitted.csv— 127 independence-compliant companiesINDEPENDENCE.md— Full PCAOB compliance frameworkRAG_ARCHITECTURE.md— Technical deep dive on RAG systemMILESTONE_01_PFSI_SUCCESS.md— PFSI case study (first successful extraction)
Vocabulary
Terms to use consistently:
- Ground truth: Authoritative data from primary sources (SEC filings, regulatory databases)
- Provenance: Metadata chain (source URL → SHA-256 → extraction method)
- CAM: Critical Audit Matter (NOT "significant audit matter")
- ICFR: Internal Control over Financial Reporting (NOT "internal controls")
- Extraction: Automated data retrieval (NOT "scraping")
- Validation gates: Quality checks (balance sheet, EPS, provenance)
Avoid:
- "Scraping" (implies unstructured/aggressive data collection)
- "AI-generated" (use "LLM-assisted" or "RAG-powered")
- "Black box" (emphasize explainability, provenance, human-in-the-loop)
Notes
Baseline date: October 25, 2025
Current status:
- Phase 1 (Extraction): ✅ COMPLETE (13 companies)
- Phase 2 (RAG): 🔴 IN PROGRESS (3-week timeline)
- Phase 3 (ML): ⚪ NOT STARTED (4-6 weeks after RAG)
Key decisions:
- ChromaDB selected over Pinecone (cost, simplicity)
- OpenAI embeddings selected over open-source (quality)
- RAG-first strategy (not scale-first)
- Supervised ML (not RL)
Audit context:
- Owner: Nirvan Chitnis (PWC Audit Associate, started Oct 3, 2025)
- Public repo: https://github.com/nirvanchitnis-cmyk/ground-truth
- Professional reputation protection: No inappropriate content, audit-grade standards
Didn't find tool you were looking for?