PWC Audit Intelligence Skill

Purpose

This skill packages audit domain expertise for the ground-truth project, enabling Claude to understand audit concepts, compliance requirements, and quality validation without re-explanation in each session.

When to Use This Skill

Extracting ground truth data from SEC filings (10-K, 10-Q, DEF 14A)
Analyzing Critical Audit Matters (CAMs)
Validating Internal Controls over Financial Reporting (ICFR)
Ensuring PCAOB independence compliance (Rule 3520)
Building audit intelligence systems (RAG, ML models, partner dashboards)

Core Audit Concepts

1. Critical Audit Matters (CAMs)

Definition: Matters arising from the current period audit of financial statements that were communicated (or required to be communicated) to the audit committee and that:

Relate to accounts/disclosures material to the financial statements
Involved especially challenging, subjective, or complex auditor judgment

Why They Matter:

High audit risk: Areas requiring significant professional judgment
Complexity indicator: More CAMs = more complex client
Partner resource allocation: CAM-heavy clients require senior staff
Churn risk: Clients with increasing CAM count may be at risk

Examples (from ground-truth extractions):

PFSI (PennyMac): Mortgage Servicing Rights (MSRs) valuation
- Why critical: "Involved especially challenging, subjective, or complex judgments"
- Sector-specific: Unique to mortgage banking
ILMN (Illumina): Revenue recognition for complex multi-element arrangements
Banking sector: Loan loss reserves (CECL calculations)

Typical CAM Counts:

0-1 CAMs: Low complexity (straightforward audits)
2-3 CAMs: Moderate complexity (industry-standard)
4+ CAMs: High complexity (partner-intensive)

Red Flags:

Increasing CAM count year-over-year
Same CAM repeated for 3+ years (unresolved issues)
CAMs related to management estimates (subjectivity risk)

2. Internal Controls over Financial Reporting (ICFR)

Definition (SOX 404): Process designed to provide reasonable assurance regarding reliability of financial reporting and preparation of financial statements.

Effectiveness Conclusion: Binary (Effective / Ineffective)

Effective: No material weaknesses identified
Ineffective: One or more material weaknesses exist

Why It Matters:

Audit quality: Clean ICFR = lower detection risk
Client health: ICFR failures signal governance issues
Churn risk: Material weaknesses increase partner workload, may lead to resignation
Regulatory risk: ICFR failures attract SEC scrutiny

Material Weakness Examples:

Inadequate segregation of duties
Ineffective IT general controls (ITGC)
Lack of evidence for key control execution
Management override of controls

Ground Truth Extraction (sec_10k.controls):

json

{
  "icfr_effective": true,
  "auditor": "Deloitte & Touche LLP",
  "opinion_date": "2025-02-19",
  "opinion": "In our opinion, the Company maintained, in all material respects, effective internal control over financial reporting..."
}

ICFR Status Distribution (across 13 companies):

Effective: 13/13 (100%) — all test companies have clean controls
Ineffective: 0/13 — no material weaknesses found (expected, these are mature public companies)

3. PCAOB Independence Compliance (Rule 3520)

Rule: Auditors must be independent in fact and appearance

Key Prohibitions:

Cannot audit own firm's clients (independence conflict)
Cannot perform certain non-audit services (e.g., bookkeeping)
Cannot have financial interest in audit client
Rotation requirements (lead partner: 5 years, reviewing partner: 5 years)

Why It Matters for ground-truth:

Testing restriction: Can ONLY use non-PWC clients for development
Deployment restriction: Must get Independence Office approval before using PWC client data
Provenance requirement: Must document that all test data is independence-compliant

Permitted Test Companies (127 total):

Audited by: EY, Deloitte, KPMG (NOT PWC)
Primary test company: PFSI (PennyMac Financial Services) — Deloitte client
Full list: config/test_companies_permitted.csv

Verification Process:

Check PCAOB Form AP (auditor registration)
Cross-reference company CIK with Form AP client list
If PWC is auditor → FAIL (cannot use)
If EY/Deloitte/KPMG → PASS (permitted)

Deployment Checklist (before using PWC clients):

Independence Office consultation
Legal/Compliance sign-off
Client consent (where required)
Bias audit completed
PCAOB consultation (recommended)

4. SEC Filing Types

10-K (Annual Report):

Item 1A: Risk Factors
Item 7: MD&A (Management's Discussion & Analysis)
Item 9A: Controls and Procedures (ICFR opinion)
Critical Audit Matters: In auditor's report (near end of 10-K)

10-Q (Quarterly Report):

Condensed financials (no full CAM disclosure)
ICFR disclosure only if material change

DEF 14A (Proxy Statement):

Auditor information (fees, tenure)
Executive compensation
Board composition
Related party transactions

8-K (Current Report):

Material events (M&A, executive changes, restatements)

Ground Truth Extraction Workflow

Phase 1: Company Resolution

Input: Ticker (e.g., "PFSI") or CIK (e.g., "0001745916")

Process:

Resolve ticker → CIK via SEC API
Fetch company profile (includes SIC code, auditor, fiscal year end)
Check independence: Is company a PWC client? (FAIL if yes)

Output: Company metadata

json

{
  "ticker": "PFSI",
  "cik": "0001745916",
  "company_name": "PENNYMAC FINANCIAL SERVICES INC",
  "sic_code": "6162",
  "sector": "mortgage",
  "auditor": "Deloitte & Touche LLP"
}

Phase 2: Data Extraction

Sources:

SEC XBRL: Financial statements (Assets, Liabilities, Equity, Net Income, EPS)
SEC 10-K: CAMs, ICFR, Risk Factors
Market data: Current price, market cap
Sector-specific: HMDA (mortgage), FDIC (banking), EIA (utilities)

Provenance Requirements:

source_url: Direct link to SEC filing
file_sha256: Cryptographic hash of source document
extraction_method: How fact was extracted (XBRL API, regex, table parser)
confidence: Score 0.0-1.0 (1.0 = deterministic, <1.0 = heuristic)

Phase 3: Validation

Balance Sheet Reconciliation:

Assets = Liabilities + Stockholders' Equity

PASS: Difference < $1M or < 0.1% of assets
FAIL: Significant imbalance (check for off-balance-sheet items, extraction errors)

EPS Consistency:

Calculated EPS = Net Income / Shares Outstanding

PASS: Within $0.01 of reported diluted EPS
FAIL: Material difference (check for share count errors, preferred dividends)

Data Quality:

Required facts present (Assets, Liabilities, Equity, Net Income)
Numeric plausibility (no negative equity for going concerns)
Date validity (ISO 8601 format)

Provenance Completeness:

All facts have source_url
All facts have SHA-256 checksums
Evidence files archived locally

Phase 4: Sector Routing

SIC Code Classification:

Banking (6000-6099): FDIC call reports, summary of deposits
Mortgage (6100-6199): HMDA originations, PMMS rates, FHFA HPI
Utilities (4900-4999): EIA data, EPA CAMD emissions
Airlines (4500-4599): BTS Form 41, on-time performance
Tech/Other: Base extractors only (XBRL, 10-K, market data)

Routing Command:

bash

python -m ground_truth.cli classify PFSI
# Output: Sector: mortgage, Extractors: sec_xbrl, sec_10k, market_data, hmda, pmms, fhfa_hpi

RAG Integration (Phase 2)

Chunking Strategy

10-K Sections to Embed:

Critical Audit Matters (keep each CAM as separate chunk)
ICFR controls (single chunk)
Risk Factors (split if >1000 tokens)
MD&A (split by subsection)

Chunk Metadata (required):

json

{
  "chunk_id": "PFSI_2024_CAM_01",
  "ticker": "PFSI",
  "cik": "0001745916",
  "section": "Critical Audit Matters",
  "filing_date": "2024-12-31",
  "source_sha256": "f52e532ba...",
  "chunk_index": 0,
  "total_chunks": 2
}

Query Patterns

Factual Retrieval:

"What are PFSI's Critical Audit Matters?"
"Is PFSI's ICFR effective?"
"What is EWBC's total assets?"

Comparative:

"Compare PFSI and RKT risk factors"
"Which has more CAMs: PFSI or RKT?"

Analytical:

"Which mortgage companies have ICFR failures?"
"Which companies have 3+ CAMs?"
"Show all companies audited by Deloitte"

Temporal (requires multi-year data):

"Did PFSI's net income increase in 2024?"
"What CAMs did PFSI have in 2023 vs 2024?"

LTV Prediction Model (Phase 3)

Features (Churn Risk Indicators)

Complexity Indicators:

CAM count (0-1 = low, 2-3 = medium, 4+ = high)
ICFR effectiveness (effective = -20 pts, ineffective = +50 pts)
Restatement history (each restatement = +30 pts)

Financial Health:

Profitability (net income < 0 = +20 pts)
Leverage (debt/equity > 3.0 = +15 pts)
Liquidity (current ratio < 1.0 = +10 pts)

Sector Risk:

Mortgage (cyclical, sensitive to rates) = +10 pts
Banking (regulatory-heavy) = +5 pts
Tech (fast-changing, valuation risk) = +5 pts

Heuristic Score (0-100):

python

def calculate_client_risk_score(ground_truth):
    score = 50  # Base score

    # CAM complexity
    cam_count = len(ground_truth['critical_audit_matters'])
    if cam_count >= 3:
        score += 30
    elif cam_count == 0:
        score -= 30

    # ICFR status
    if not ground_truth['controls']['icfr_effective']:
        score += 50
    else:
        score -= 20

    # Profitability
    if ground_truth['xbrl']['NetIncomeLoss'] < 0:
        score += 20

    return max(0, min(100, score))  # Clamp to 0-100

Risk Bands:

0-30: Low risk (retain, minimal partner time)
31-60: Medium risk (monitor, standard engagement)
61-100: High risk (churn candidate, consider resignation)

Quality Standards

Audit-Grade Provenance

Every fact must include:

Source URL: Direct link to SEC filing
SHA-256: Cryptographic hash for verification
Extraction method: How it was obtained
Confidence score: Reliability estimate

Example:

json

{
  "matter": "Mortgage Servicing Rights (MSRs)",
  "why_critical": "Involved especially challenging, subjective, or complex judgments",
  "provenance": {
    "source_url": "https://www.sec.gov/Archives/edgar/data/1745916/000155837025001148/pfsi-20241231x10k.htm",
    "file_sha256": "f52e532ba113920525cf682c979c52e107904c847d6532ef0d922b71ba684e6a",
    "extraction_method": "section_locator",
    "confidence": 0.7
  }
}

Verification Process

Partners must be able to:

Download original 10-K from SEC
Compute SHA-256 hash
Verify it matches provenance metadata
Manually locate cited text in filing

If mismatch:

Flag for investigation (corruption or fabrication)
Re-run extraction
Update evidence files

PWC Pitch Framework

Problem

Partners manually review 5,000+ client 10-Ks annually:

200 pages per 10-K × 2 hours = 10,000 hours/year
No institutional memory (knowledge locked in partner minds)
Reactive (find issues after they occur) vs proactive

Solution

Automated ground truth extraction + RAG query interface:

Extract CAMs, ICFR, financials in 3 minutes
Natural language queries: "Which clients have material weaknesses?"
Full provenance (SHA-256 verification)
Proactive alerts (CAM count increasing, ICFR failures)

Proof

13 companies extracted (100% success rate)
43.8 MB evidence archived with SHA-256 checksums
All 127 test companies are independence-compliant
RAG POC: Partners query in 2 seconds vs 2 hours

Business Impact

Efficiency:

200 hours → 2 hours per partner per year (100x gain)

Risk Detection:

Automated alerts for high-risk clients
Early warning for churn candidates
Proactive partner-client matching

Compliance:

Audit trail maintained (SHA-256 provenance)
Independence safeguards built-in
PCAOB-compliant validation

Ask

6 weeks + AWS/Azure credits for 50-company pilot

Common Failure Modes

Extraction Errors

CAM extraction returns 0 CAMs (but company likely has them):

Cause: Section locator regex doesn't match 10-K format
Fix: Debug section headers, test on 3-5 known CAM companies

Balance sheet doesn't balance:

Cause: XBRL period inconsistency (mixing quarterly/annual)
Fix: Consistent period selection logic (see PFSI bug fix)

"Revenues" fact missing for banks:

Cause: Banks use different XBRL tags (InterestAndDividendIncomeOperating)
Fix: Add sector-specific tag mappings

Validation Failures

EPS consistency check fails:

Possible causes: Preferred dividends, share count timing, rounding
Action: Check 10-K footnotes, verify share count source

Data quality FAIL:

Cause: Required fact missing (Assets, Liabilities, etc.)
Action: Check XBRL API response, add fallback tag mappings

Independence Violations

Extracted JPM data (PWC client):

Risk: PCAOB Rule 3520 violation
Action: Delete immediately, add to blocked list, re-check permitted companies

Reference Files

In ground-truth repo:

config/test_companies_permitted.csv — 127 independence-compliant companies
INDEPENDENCE.md — Full PCAOB compliance framework
RAG_ARCHITECTURE.md — Technical deep dive on RAG system
MILESTONE_01_PFSI_SUCCESS.md — PFSI case study (first successful extraction)

Vocabulary

Terms to use consistently:

Ground truth: Authoritative data from primary sources (SEC filings, regulatory databases)
Provenance: Metadata chain (source URL → SHA-256 → extraction method)
CAM: Critical Audit Matter (NOT "significant audit matter")
ICFR: Internal Control over Financial Reporting (NOT "internal controls")
Extraction: Automated data retrieval (NOT "scraping")
Validation gates: Quality checks (balance sheet, EPS, provenance)

Avoid:

"Scraping" (implies unstructured/aggressive data collection)
"AI-generated" (use "LLM-assisted" or "RAG-powered")
"Black box" (emphasize explainability, provenance, human-in-the-loop)

Notes

Baseline date: October 25, 2025

Current status:

Phase 1 (Extraction): ✅ COMPLETE (13 companies)
Phase 2 (RAG): 🔴 IN PROGRESS (3-week timeline)
Phase 3 (ML): ⚪ NOT STARTED (4-6 weeks after RAG)

Key decisions:

ChromaDB selected over Pinecone (cost, simplicity)
OpenAI embeddings selected over open-source (quality)
RAG-first strategy (not scale-first)
Supervised ML (not RL)

Audit context:

Owner: Nirvan Chitnis (PWC Audit Associate, started Oct 3, 2025)
Public repo: https://github.com/nirvanchitnis-cmyk/ground-truth
Professional reputation protection: No inappropriate content, audit-grade standards

Search AI Tools

pwc-audit-intelligence

Install this agent skill to your Project

SKILL.md

PWC Audit Intelligence Skill

Purpose

When to Use This Skill

Core Audit Concepts

1. Critical Audit Matters (CAMs)

2. Internal Controls over Financial Reporting (ICFR)

3. PCAOB Independence Compliance (Rule 3520)

4. SEC Filing Types

Ground Truth Extraction Workflow

Phase 1: Company Resolution

Phase 2: Data Extraction

Phase 3: Validation

Phase 4: Sector Routing

RAG Integration (Phase 2)

Chunking Strategy

Query Patterns

LTV Prediction Model (Phase 3)

Features (Churn Risk Indicators)

Quality Standards

Audit-Grade Provenance

Verification Process

PWC Pitch Framework

Problem

Solution

Proof

Business Impact

Ask

Common Failure Modes

Extraction Errors

Validation Failures

Independence Violations

Reference Files

Vocabulary

Notes