Agent skill

PDF Processing

Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.

Stars 1
Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/dy9759/Text2KnowledgeCards/tree/main/skills/aitemplates-skills/pdf-processing

SKILL.md

PDF Processing

Quick start

Use pdfplumber to extract text from PDFs:

python
import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    text = pdf.pages[0].extract_text()
    print(text)

Extracting tables

Extract tables from PDFs with automatic detection:

python
import pdfplumber

with pdfplumber.open("report.pdf") as pdf:
    page = pdf.pages[0]
    tables = page.extract_tables()

    for table in tables:
        for row in table:
            print(row)

Extracting all pages

Process multi-page documents efficiently:

python
import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    full_text = ""
    for page in pdf.pages:
        full_text += page.extract_text() + "\n\n"

    print(full_text)

Form filling

For PDF form filling, see FORMS.md for the complete guide including field analysis and validation.

Merging PDFs

Combine multiple PDF files:

python
from pypdf import PdfMerger

merger = PdfMerger()

for pdf in ["file1.pdf", "file2.pdf", "file3.pdf"]:
    merger.append(pdf)

merger.write("merged.pdf")
merger.close()

Splitting PDFs

Extract specific pages or ranges:

python
from pypdf import PdfReader, PdfWriter

reader = PdfReader("input.pdf")
writer = PdfWriter()

# Extract pages 2-5
for page_num in range(1, 5):
    writer.add_page(reader.pages[page_num])

with open("output.pdf", "wb") as output:
    writer.write(output)

Available packages

  • pdfplumber - Text and table extraction (recommended)
  • pypdf - PDF manipulation, merging, splitting
  • pdf2image - Convert PDFs to images (requires poppler)
  • pytesseract - OCR for scanned PDFs (requires tesseract)

Common patterns

Extract and save text:

python
import pdfplumber

with pdfplumber.open("input.pdf") as pdf:
    text = "\n\n".join(page.extract_text() for page in pdf.pages)

with open("output.txt", "w") as f:
    f.write(text)

Extract tables to CSV:

python
import pdfplumber
import csv

with pdfplumber.open("tables.pdf") as pdf:
    tables = pdf.pages[0].extract_tables()

    with open("output.csv", "w", newline="") as f:
        writer = csv.writer(f)
        for table in tables:
            writer.writerows(table)

Error handling

Handle common PDF issues:

python
import pdfplumber

try:
    with pdfplumber.open("document.pdf") as pdf:
        if len(pdf.pages) == 0:
            print("PDF has no pages")
        else:
            text = pdf.pages[0].extract_text()
            if text is None or text.strip() == "":
                print("Page contains no extractable text (might be scanned)")
            else:
                print(text)
except Exception as e:
    print(f"Error processing PDF: {e}")

Performance tips

  • Process pages in batches for large PDFs
  • Use multiprocessing for multiple files
  • Extract only needed pages rather than entire document
  • Close PDF objects after use

Expand your agent's capabilities with these related and highly-rated skills.

dy9759/Text2KnowledgeCards

internal-comms

A set of resources to help me write all kinds of internal communications, using the formats that my company likes to use. Claude should use this skill whenever asked to write some sort of internal communications (status reports, leadership updates, 3P updates, company newsletters, FAQs, incident reports, project updates, etc.).

1 0
Explore
dy9759/Text2KnowledgeCards

mcp-builder

Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).

1 0
Explore
dy9759/Text2KnowledgeCards

canvas-design

Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.

1 0
Explore
dy9759/Text2KnowledgeCards

brand-guidelines

Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.

1 0
Explore
dy9759/Text2KnowledgeCards

algorithmic-art

Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.

1 0
Explore
dy9759/Text2KnowledgeCards

template-skill

Replace with description of the skill and when Claude should use it.

1 0
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results