Agent skill

ocrmypdf-batch

OCRmyPDF batch processing skill — process multiple PDFs, Docker automation, shell scripting, and CI/CD integration. Use when the user needs to OCR many PDFs, set up automated OCR pipelines, or integrate OCR into workflows.

View SKILL.md on GitHub Repository

Stars 254

Forks 41

Install this agent skill to your Project

npx add-skill https://github.com/partme-ai/full-stack-skills/tree/main/skills/ocrmypdf-skills/ocrmypdf-batch

SKILL.md

OCRmyPDF — Batch Processing Guide

Overview

OCRmyPDF supports batch processing through shell scripting, Docker, and CI/CD integration for automated OCR pipelines.

For core OCR functionality, see the ocrmypdf skill. For image processing, see ocrmypdf-image. For optimization, see ocrmypdf-optimize.

Shell Loop

Basic batch

bash

# Process all PDFs in directory
for f in *.pdf; do
    ocrmypdf "$f" "output/$f"
done

Parallel processing

bash

# Use GNU parallel for faster processing
parallel ocrmypdf {} output/{/} ::: *.pdf

# Limit to 4 concurrent jobs
parallel -j 4 ocrmypdf {} output/{/} ::: *.pdf

Recursive batch

bash

# Process all PDFs in directory tree
find . -name "*.pdf" -exec ocrmypdf {} output/{/} \;

Docker

Official image

bash

# Pull image
docker pull jbarlow83/ocrmypdf

# Basic usage
docker run --rm \
    -v $(pwd):/data \
    jbarlow83/ocrmypdf \
    input.pdf output.pdf

Batch with Docker

bash

# Process all PDFs
docker run --rm \
    -v $(pwd):/data \
    jbar65t83/ocrmypdf \
    ocrmypdf /data/input/*.pdf /data/output/

Docker Compose

yaml

version: '3'
services:
  ocrmypdf:
    image: jbarlow83/ocrmypdf
    volumes:
      - ./input:/data/input
      - ./output:/data/output
    command: sh -c "for f in /data/input/*.pdf; do ocrmypdf \"$f\" \"/data/output/$(basename $f)\"; done"

GitHub Actions

yaml

name: OCR PDFs
on: [push]
jobs:
  ocr:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run OCR
        run: |
          docker run --rm \
            -v ${{ github.workspace }}:/data \
            jbarlow83/ocrmypdf \
            sh -c "for f in /data/*.pdf; do ocrmypdf \"$f\" \"/data/output/$(basename $f)\"; done"

CI/CD Examples

GitLab CI

yaml

ocr:
  image: jbarlow83/ocrmypdf
  script:
    - mkdir -p output
    - for f in *.pdf; do ocrmypdf "$f" "output/$f"; done
  artifacts:
    paths:
      - output/

Shell script template

bash

#!/bin/bash
INPUT_DIR="input"
OUTPUT_DIR="output"
LANG="eng+chi_sim"

mkdir -p "$OUTPUT_DIR"

for pdf in "$INPUT_DIR"/*.pdf; do
    filename=$(basename "$pdf")
    echo "Processing: $filename"
    ocrmypdf -l "$LANG" --deskew --remove-bordering "$pdf" "$OUTPUT_DIR/$filename"
    echo "Done: $filename"
done

echo "Batch OCR complete!"

Error Handling

bash

# Continue on error, log failures
for f in *.pdf; do
    if ! ocrmypdf "$f" "output/$f" 2>&1; then
        echo "FAILED: $f" >> failed.log
    fi
done

Performance Tips

Use --jobs N for multi-core processing
Use --output-type pdf (not pdfa) for faster processing when archival not needed
Pre-process images with --deskew and --clean to reduce file size
Use Docker layer caching in CI/CD for faster rebuilds

Quick Reference

Task	Command
Sequential batch	`for f in *.pdf; do ocrmypdf "$f" out/"$f"; done`
Parallel batch	`parallel ocrmypdf {} out/{/} ::: *.pdf`
Docker basic	`docker run -v $(pwd):/data jbarlow83/ocrmypdf in.pdf out.pdf`
Recursive	`find . -name "*.pdf" -exec ocrmypdf {} out/{/} \;`

Troubleshooting

Permission denied: Ensure output directory is writable.
Memory issues: Process in smaller batches or use --jobs 1.
Docker path issues: Use absolute paths with -v.

Maintainer

partme-ai Core maintainer

Source details

Full Name: partme-ai/full-stack-skills
Branch: main
Path in repo: skills/ocrmypdf-skills/ocrmypdf-batch
License: Other
Topics: claude-code agent-skills cursor skills codebuddy qoder

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

partme-ai/full-stack-skills

ocrmypdf-optimize

OCRmyPDF optimization skill — compress PDFs, configure PDF/A output, JBIG2 encoding, and lossless optimization. Use when the user needs to reduce PDF file size, create archival PDF/A files, or optimize OCR output.

254 41

Explore

partme-ai/full-stack-skills

ocrmypdf-image

OCRmyPDF image processing skill — deskew, rotate, clean, despeckle, remove border from scanned documents. Use when the user needs to improve scanned PDF quality, fix skewed pages, remove noise, or clean up scanned documents before OCR.

254 41

Explore

partme-ai/full-stack-skills

ocrmypdf-api

OCRmyPDF Python API and plugin skill — use OCRmyPDF programmatically from Python, integrate with applications, and extend with plugins (EasyOCR, PaddleOCR, AppleOCR). Use when the user needs to call OCRmyPDF from Python code, build OCR pipelines, or use alternative OCR engines.

254 41

Explore

partme-ai/full-stack-skills

ocrmypdf

OCRmyPDF core skill — add searchable OCR text layer to scanned PDFs, convert images to searchable PDFs, support 100+ languages via Tesseract. Use when the user needs to OCR a PDF, make a scanned PDF searchable, or extract text from scanned documents.

254 41

Explore

partme-ai/full-stack-skills

svelte

Guides Svelte and SvelteKit development including reactive components, stores, transitions, lifecycle hooks, SSR, file-based routing, and deployment. Use when the user needs to build Svelte components, create SvelteKit applications, implement reactivity patterns, or configure Svelte with Vite.

254 41

Explore

partme-ai/full-stack-skills

tui-empty

Generate and render a pixel-precise ASCII TUI Empty State component with complete output blocks (TUI_RENDER, COMPONENT_SPEC, PENCIL_SPEC, PENCIL_BATCH_DESIGN) for Pencil MCP drawing workflows. Use when the user asks to create an empty state in a terminal UI, text-based interface, or Pencil MCP project.

254 41

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

OCRmyPDF — Batch Processing Guide

Overview

Shell Loop

Basic batch

Parallel processing

Recursive batch

Docker

Official image

Batch with Docker

Docker Compose

GitHub Actions

CI/CD Examples

GitLab CI

Shell script template

Error Handling

Performance Tips

Quick Reference

Troubleshooting

Recommended Agent Skills

ocrmypdf-optimize

ocrmypdf-image

ocrmypdf-api

ocrmypdf

svelte

tui-empty