Agent skill
gpu-document-processing
Use when processing large PDFs, document collections, or bulk text extraction tasks that benefit from GPU-accelerated processing. Triggers when the user provides large documents or needs bulk document analysis.
Install this agent skill to your Project
npx add-skill https://github.com/langchain-ai/deepagents/tree/main/examples/nvidia_deep_agent/skills/gpu-document-processing
SKILL.md
GPU Document Processing Skill
Process large documents and document collections using GPU-accelerated tools. This skill uses the sandbox-as-tool pattern: the agent runs on CPU for reasoning, and sends document processing work to a GPU-equipped environment.
When to Use This Skill
Use this skill when:
- Processing large PDF files (50+ pages)
- Analyzing collections of documents (10+ files)
- Extracting structured data from unstructured documents
- Performing bulk text extraction and chunking
- Generating embeddings for large document sets
- The user uploads or references large documents for analysis
Architecture: Sandbox as Tool
This skill follows the sandbox-as-tool pattern for GPU execution:
- Agent reasons on CPU - planning, synthesis, report writing
- Processing sent to GPU sandbox - document parsing, embedding, extraction
- Results returned to agent - structured output for further analysis
This separation ensures:
- API keys stay outside the sandbox (security)
- Agent state persists independently of processing jobs
- Processing can be parallelized across documents
- Cost-efficient: GPU used only during processing, not during reasoning
Capabilities
PDF Text Extraction
Extract text content from PDF documents with layout preservation:
- Headers, paragraphs, lists, and tables detected separately
- Page numbers and section boundaries preserved
- Multi-column layout handling
Tabular Data Extraction
Extract tables from documents into structured formats:
- PDF tables to CSV/DataFrames using GPU-accelerated parsing
- Automatic column type detection
- Handles merged cells and multi-row headers
Document Chunking
Split large documents into meaningful chunks for analysis:
- Semantic chunking (by topic/section boundaries)
- Fixed-size chunking with overlap for embedding
- Configurable chunk sizes (default: 512 tokens)
Embedding Generation
Generate vector embeddings for document chunks:
- Uses NVIDIA NeMo Retriever NIM for GPU-accelerated embedding
- Supports batch processing for large document sets
- Compatible with standard vector stores (Milvus, ChromaDB)
Workflow
- Receive document reference from the orchestrator
- Determine processing type (extraction, analysis, embedding)
- Send to GPU sandbox for processing
- Collect structured results (text, tables, embeddings)
- Write findings to /shared/ for the orchestrator to synthesize
Processing Large Document Collections
For multiple documents:
- Process documents in parallel batches (3-5 concurrent)
- Extract key metadata first (title, date, author, page count)
- Generate per-document summaries
- Cross-reference findings across documents
- Write consolidated findings with per-document citations
Output Format
When reporting document processing results:
- Include document metadata (filename, pages, size)
- Structure extracted content by section/chapter
- Format tables as markdown tables
- Include page references for all extracted content
- Note any extraction quality issues (scanned images, corrupted pages)
Integration with NVIDIA NIM
For production deployments, GPU document processing can leverage:
- NVIDIA NeMo Retriever: GPU-accelerated embedding and retrieval
- NVIDIA RAPIDS cuDF: Tabular data processing from extracted tables
- NVIDIA Triton: Scalable inference for document classification models
See NVIDIA's NIM documentation for self-hosted deployment options.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
cuml-machine-learning
Use for GPU-accelerated machine learning on tabular data using NVIDIA cuML. Triggers when tasks involve classification, regression, clustering, dimensionality reduction, or model training on datasets.
cudf-analytics
Use for GPU-accelerated data analysis on datasets, CSVs, or tabular data using NVIDIA cuDF. Triggers when tasks involve groupby aggregations, statistical summaries, anomaly detection, or large-scale data profiling.
data-visualization
Use for creating publication-quality charts and multi-panel analysis summaries. Triggers when tasks involve visualizing data, plotting results, creating charts, or producing visual reports from analysis output.
schema-exploration
Lists tables, describes columns and data types, identifies foreign key relationships, and maps entity relationships in a database. Use when the user asks about database schema, table structure, column types, what tables exist, ERD, foreign keys, or how entities relate.
query-writing
Writes and executes SQL queries from simple SELECTs to complex multi-table JOINs, aggregations, and subqueries. Use when the user asks to query a database, write SQL, run a SELECT statement, retrieve data, filter records, or generate reports from database tables.
social-media
Drafts engaging social media posts, writes hooks, suggests hashtags, creates thread structures, and generates companion images. Use when the user asks to write a LinkedIn post, tweet, Twitter/X thread, social media caption, social post, or repurpose content for social platforms.
Didn't find tool you were looking for?