Agent skill
file-summarization
Summarize files by reading content, extracting key passages, and applying type-specific strategies. Activates on summarize this file, what's in this file, describe this codebase, file summary, analyze this file, tl;dr this file, what does this code do, explain this config, break down this script. Routes to strategies for code, config, data, documentation, markup, and binary files based on extension and word count.
Install this agent skill to your Project
npx add-skill https://github.com/Jamie-BitFlight/claude_skills/tree/main/plugins/summarizer/skills/file-summarization
SKILL.md
File Summarization
Apply this methodology when summarizing files of any type. This skill provides the routing logic and type-specific strategies for faithful file summarization.
Pre-Summarization Assessment
Before summarizing any file, the model MUST:
-
Read the file - Use the Read tool to access the actual content. Never guess from the filename.
-
Assess size - Run
$CLAUDE_PLUGIN_ROOT/scripts/file_metrics.pyto determine word count and file type. If the script is unavailable, use the Read tool and manually estimate word count from line count. -
Select strategy - Based on size thresholds from the table below.
-
Verify file type - Use file extension and content inspection to determine which type-specific strategy to apply.
Size-Based Strategy Selection
| File Size | Strategy | Approach |
|---|---|---|
| Small (< 2,000 words) | Full read with extractive summarization | Read entire file, extract key passages, summarize from extracts |
| Medium (2,000-10,000 words) | Section-based extraction | Read full file, identify sections/modules, extract from each section, synthesize |
| Large (> 10,000 words) | Chunk and map-reduce | Split into chunks, summarize each chunk, synthesize chunk summaries |
SOURCE: Size thresholds adapted from Anthropic knowledge-synthesis skill (knowledge-work-plugins repository, accessed 2026-02-06). Strategy patterns informed by Map-Reduce Summarization methodology.
File Type Strategies
Code Files
File extensions: .py, .js, .ts, .jsx, .tsx, .rs, .go, .java, .c, .cpp, .h, .rb, .php, .swift, .kt, .scala, .sh, .bash, .zsh
The model MUST extract:
- Imports/dependencies - List external modules and standard library imports
- Structure - Classes, functions, methods with signatures
- Purpose - Inferred from docstrings, comments, function names
- Key logic - Core algorithms, state machines, data transformations
- Entry points -
main(), CLI argument parsing, exported functions - Configuration - Environment variables, config file references
Extraction method: Read sequentially. Capture top-level definitions with their line numbers. Extract docstrings verbatim. Quote complex logic rather than paraphrasing.
Example summary structure:
## Summary
Python module for HTTP client authentication. Implements JWT token refresh flow with retry logic. Exports `AuthClient` class and `refresh_token()` function.
## What Was Found
- Class `AuthClient` (lines 15-87): JWT-based HTTP client with automatic token refresh
- Function `refresh_token()` (lines 92-105): Retries up to 3 times on 401 errors
- Dependencies: `httpx`, `jwt`, `tenacity` (lines 1-3)
- Environment variables: `AUTH_BASE_URL`, `AUTH_CLIENT_ID` (lines 10-11)
## What Was NOT Found
- No test coverage information in this file
- No error handling for network failures
- Configuration schema not documented
Configuration Files
File extensions: .json, .yaml, .yml, .toml, .ini, .env, .conf, .cfg, .properties
The model MUST extract:
- Top-level keys - All root keys with their value types
- Nested structure - Hierarchy depth and organization
- Settings categories - Group keys by purpose if clear
- Notable values - Endpoints, file paths, feature flags, credentials (note presence, do not expose values)
- Validation constraints - Type requirements, enums, ranges if documented
Extraction method: Parse structure. For small files, include all keys. For large files, sample representative sections and note structure patterns.
Example summary structure:
## Summary
Application configuration in YAML format. Defines database connection, API endpoints, feature flags, and logging settings. 47 configuration keys across 5 top-level sections.
## What Was Found
- `database.host`, `database.port`, `database.name` (lines 2-4): PostgreSQL connection settings
- `api.base_url`, `api.timeout` (lines 7-8): External API configuration
- `features.experimental_mode: false` (line 12): Feature flag for beta features
- `logging.level: INFO`, `logging.format` (lines 15-16): Logging configuration
## What Was NOT Found
- No schema validation rules present
- No environment-specific overrides documented
- API authentication credentials not in this file
Data Files
File extensions: .csv, .tsv, .parquet, .json (when data-structured), .jsonl, .ndjson
The model MUST extract:
- Row count - Exact number of records
- Column names - All column headers
- Data types - Inferred from first N rows
- Sample values - Representative examples from each column
- Missing data - Columns with null/empty values
- Unique identifiers - Primary key columns if evident
Extraction method: For CSV/TSV, read header row and first 10 data rows. For Parquet, note that binary inspection is limited. For JSON, inspect array structure.
Example summary structure:
## Summary
CSV file containing user activity logs. 1,247 rows with 8 columns. Timestamps range from 2025-01-01 to 2026-02-06. No missing values detected.
## What Was Found
- Column `user_id` (integer): User identifiers, range 1001-5432
- Column `timestamp` (ISO 8601): Activity timestamps
- Column `action` (string): Values include "login", "logout", "view_page", "click_button"
- Column `duration_ms` (integer): Range 0-45000
- 1,247 total records (line count: 1,248 including header)
## What Was NOT Found
- No schema documentation in file
- Column `referrer` is present but not documented
- No indication of data collection methodology
Documentation Files
File extensions: .md, .rst, .txt, .adoc, .org
The model MUST extract:
- Topic hierarchy - Top-level headings and structure
- Key sections - Main topics covered
- Commands/examples - Code blocks, shell commands, API calls
- Links - External references and internal cross-references
- Definitions - Technical terms defined in the text
Extraction method: Read sequentially. Extract headings to build table of contents. Quote key passages that define core concepts. Note code examples.
Example summary structure:
## Summary
User guide for deploying containerized applications. Covers Docker setup, image building, registry configuration, and troubleshooting. 5 main sections with 23 subsections. Includes 12 shell command examples.
## What Was Found
- Section "Getting Started" (lines 10-45): Docker installation on Linux and macOS
- Section "Building Images" (lines 47-89): Dockerfile syntax and multi-stage builds
- Section "Troubleshooting" (lines 200-245): Common errors with solutions
- 12 shell command examples throughout document
## What Was NOT Found
- No Windows deployment instructions
- Security best practices not covered
- Performance tuning section mentioned but not written (line 15: "TODO")
Binary and Unknown Files
File extensions: .pdf, .zip, .tar, .gz, .bin, .exe, .so, .dylib, .dll, or unrecognized extensions
The model MUST:
-
Attempt to read - Use the Read tool. If the tool returns binary content or an error, note this.
-
State limitation - Do NOT guess contents. State: "Binary file, cannot extract text content."
-
Provide file metadata - File size, extension, location.
-
For PDFs: Use the Read tool with
pagesparameter to extract text from specific page ranges. Summarize text content if extraction succeeds.
Example for unreadable binary:
## Summary
Binary file, cannot extract text content.
## What Was Found
- File path: ./build/output.bin
- File size: 2.3 MB
- Extension: .bin
## What Was NOT Found
Unable to determine contents without binary inspection tools.
## Uncertain
File may be compiled binary, compressed archive, or proprietary format.
Quote-Grounding Technique
For all text-based files, the model MUST apply the quote-grounding technique:
- First pass - Read file, identify key passages
- Extract - Copy exact quotes with line numbers
- Organize extracts - Group by theme or importance
- Summarize from extracts - Write summary grounded in the extracted quotes
- Verify - Ensure every claim in summary traces to an extract
SOURCE: Technique adapted from Fidelity Rules Rule 2 (lines 27-41).
Output Format
All file summaries MUST use the structured output format defined in Structured Summary.
Required sections:
- YAML frontmatter - Include
source_type: file,source_path,method,confidence, word counts - Summary - Condensed content (BLUF style)
- What Was Found - Items discovered with line number references
- What Was NOT Found - Expected items that were absent
- Uncertain - Ambiguous items requiring interpretation
- Sources - Full file path, access date
Fidelity Rules
The model MUST follow all fidelity rules defined in Fidelity Rules.
Critical rules for file summarization:
- Rule 1: Read the file before summarizing. Never guess from filename.
- Rule 2: Extract before abstracting. Identify key passages first.
- Rule 3: Preserve counts and specifics. "7 functions" not "several functions."
- Rule 4: Distinguish absence from nonexistence. "Not in file" not "doesn't exist."
- Rule 6: State confidence explicitly. Full read of small file = high confidence. Truncated large file = medium/low confidence.
Multi-File Summarization
When the user requests summarization of multiple files:
- Summarize each file individually using this methodology
- Write each summary to a separate output file or section
- Do NOT merge file summaries into a single combined summary without explicit user request
- If synthesis across files is requested, load the multi-source-synthesis skill after completing individual summaries
SOURCE: Multi-source synthesis approach from Summarizer lines 33-37.
Error Handling
If a file cannot be read:
- Attempt to read with the Read tool
- If read fails, report the error: "Unable to read [file path]: [error message]"
- Do NOT speculate about file contents
- Do NOT proceed with summarization
- Ask user if they want to try alternative access methods
Output Rendering
- Read template - Load the template file at
../summarizer/templates/{format_id}.md(default:structured). The template defines the schema, required sections, and fidelity constraints for the selected format. - Render - Produce output following the template's Schema section. Use the template's Example as a reference for structure and style.
- Verify fidelity - Confirm the output satisfies the template's Fidelity Constraints and all applicable Fidelity Rules.
Anti-Patterns
The model MUST NOT:
- Summarize a file based on its name without reading it
- Guess file contents from directory structure or naming conventions
- Assume file type from extension without verifying contents
- Summarize from partial reads (head/tail/grep) without disclosing the limitation
- Upgrade "not found in file" to "file doesn't contain" in a way that implies certainty about what the file should contain
- Present interpretation as observation
- Skip the "What Was NOT Found" section
- Omit line number references for key findings
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
ccc
This skill should be used when code search is needed (whether explicitly requested or as part of completing a task), when indexing the codebase after changes, or when the user asks about ccc, cocoindex-code, or the codebase index. Trigger phrases include 'search the codebase', 'find code related to', 'update the index', 'ccc', 'cocoindex-code'.
agent-browser
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
delegate
Quick delegation template for sub-agent prompts. Use when assigning work to a sub-agent, before invoking the Agent tool, or when preparing prompts for specialized agents. Provides the WHERE-WHAT-WHY framework. For comprehensive delegation guidance, activate the agent-orchestration how-to-delegate skill.
swarm-spawning
Spawn agents and teammates in Claude Code swarms. Use when choosing between subagents vs teammates, selecting agent types (Explore, Plan, general-purpose, plugin agents), configuring spawn backends (in-process, tmux, iterm2), or setting environment variables for spawned agents.
knowledge-explorer
Manage the research/ knowledge base (KB) of tool and library research entries. Use when browsing KB topics, adding new research entries, updating existing entries with dated revisions, fetching GitHub repo metadata into a draft KB entry, or migrating old-format entries to skill-spec frontmatter. Triggers on tasks like "what do we have on X", "add this to the KB", "update the KB entry for Y", "fetch github info for owner/repo", or "migrate old entries".
design-anti-patterns
Enforce anti-AI UI design rules based on the Uncodixfy methodology. Use when generating HTML, CSS, React, Vue, Svelte, or any frontend UI code. Prevents "Codex UI" — the generic AI aesthetic of soft gradients, floating panels, oversized rounded corners, glassmorphism, hero sections in dashboards, and decorative copy. Applies constraints from Linear/Raycast/Stripe/GitHub design philosophy: functional, honest, human-designed interfaces. Triggers on: UI generation, dashboard building, frontend component creation, CSS styling, landing page design, or any task producing visual interface code.
Didn't find tool you were looking for?