Agent skill
read-bin-docs
Straightforward text extraction from document files (text-based PDF only for now, no OCR or docx). Use when you just need to read/extract text from binary documents.
Install this agent skill to your Project
npx add-skill https://github.com/YPares/agent-skills/tree/main/read-bin-docs
SKILL.md
Doc Formats
Quick Start: Extract Text from PDF
Need to extract text from a PDF? Use this Python snippet:
from pypdf import PdfReader
reader = PdfReader("document.pdf")
text = "".join(page.extract_text() for page in reader.pages)
print(text)
Or from the command line:
uvx --with pypdf python /path/to/extract_pdf_text.py document.pdf
PDF Text Extraction
Basic Usage
from pypdf import PdfReader
# Read all pages
reader = PdfReader("file.pdf")
for page in reader.pages:
text = page.extract_text()
print(text)
Extract Specific Pages
from pypdf import PdfReader
reader = PdfReader("file.pdf")
# Get pages 1-5 (0-indexed)
for page in reader.pages[0:5]:
print(page.extract_text())
Using the Script
This skill includes scripts/extract_pdf_text.py for command-line extraction:
# Extract all pages to stdout
python extract_pdf_text.py document.pdf
# Extract to file
python extract_pdf_text.py document.pdf --output text.txt
# Extract specific pages
python extract_pdf_text.py document.pdf --pages 1-5
python extract_pdf_text.py document.pdf --pages 1,3,5
Requirements
- pypdf:
uvx --with pypdf python <script> - Works with most text-based PDFs
- Scanned PDFs without OCR won't extract text
Common Issues
"No text extracted": The PDF may be scanned (image-based) without OCR. OCR support requires additional tools.
"Encoding errors": pypdf handles most encodings, but some PDFs may have encoding issues. Use page.extract_text(layout=True) for layout-aware extraction if available.
Future: Support for DOCX, XLSX, and other formats coming soon.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
nix-profile-manager
Expert guidance for agents to manage local Nix profiles for installing tools and dependencies. Covers flakes, profile management, package searching, and registry configuration.
github-pr-workflow
Working with GitHub Pull Requests using the gh CLI. Use for fetching PR details, review comments, CI status, and understanding the difference between PR-level comments vs inline code review comments.
working-with-jj
Expert guidance for using JJ (Jujutsu) version control system. Use when working with JJ, whatever the subject. Operations, revsets, templates, debugging change evolution, etc. Covers JJ commands, template system, evolog, operations log, and interoperability with git remotes.
typst-writer
Write correct and idiomatic Typst code for document typesetting. Use when creating or editing Typst (.typ) files, working with Typst markup, or answering questions about Typst syntax and features. Focuses on avoiding common syntax confusion (arrays vs content blocks, proper function definitions, state management).
nushell-plugin-builder
Guide for creating Nushell plugins in Rust using nu_plugin and nu_protocol crates. Use when users want to build custom Nushell commands, extend Nushell with new functionality, create data transformations, or integrate external tools/APIs into Nushell. Covers project setup, command implementation, streaming data, custom values, and testing.
textual-builder
Build Text User Interface (TUI) applications using the Textual Python framework (v0.86.0+). Use when creating terminal-based applications, prototyping card games or interactive CLIs, or when the user mentions Textual, TUI, or terminal UI. Includes comprehensive reference documentation, card game starter template, and styling guides.
Didn't find tool you were looking for?