Agent skill
crawl4ai
Use when crawling web pages, extracting markdown content, or scraping website data with intelligent chunking and skeleton planning. Use when the user provides a URL or link to fetch or crawl.
Install this agent skill to your Project
npx add-skill https://github.com/tao3k/xiuxian-artisan-workshop/tree/main/skills/crawl4ai
Metadata
Additional technical details for this skill
- author
- xiuxian-artisan-workshop
- intents
-
[ "Crawl a web page", "Extract markdown content", "Scrape website data", "Open or fetch a URL or link", "Research a URL or link", "Help me research a web page", "Perform deep crawl", "Get document skeleton/TOC", "Extract specific sections from web page" ] - version
- 0.2.1
- routing keywords
-
[ "crawl", "web", "scrape", "extract", "fetch", "url", "link", "research", "research url", "analyze page", "html", "markdown", "content", "scraper", "crawler", "web scraping", "web crawl", "page content", "web extraction" ]
SKILL.md
crawl4ai
High-performance web crawler with intelligent chunking. Crawls web pages and extracts content as markdown using LLM-based skeleton planning.
Commands
crawl_url (alias: webCrawl)
Crawl a web page with native workflow execution and LLM-based intelligent chunking.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | - | Target URL to crawl (required) |
action |
str | "smart" | Action mode: "smart", "skeleton", "crawl" |
fit_markdown |
bool | true | Clean and simplify markdown output |
max_depth |
int | 0 | Maximum crawling depth (0=single page) |
return_skeleton |
bool | false | Also return document skeleton (TOC) |
chunk_indices |
list[int] | - | List of section indices to extract |
Action Modes:
| Mode | Description | Use Case |
|---|---|---|
smart (default) |
LLM generates chunk plan, then extracts relevant sections | Large docs where you need specific info |
skeleton |
Extract lightweight TOC without full content | Quick overview, decide what to read |
crawl |
Return full markdown content | Small pages, complete content needed |
Runtime Transport:
max_depth = 0: Uses HTTP strategy (no browser cold-start) for lower latency.max_depth > 0: Uses browser deep-crawl strategy (BFS) for multi-page traversal.file://...withmax_depth = 0: Uses local fast-path (no crawl4ai runtime bootstrap) for deterministic fixture/local-note benchmarking.- Persistent worker mode reuses the HTTP crawler instance across requests to reduce repeated initialization cost.
Examples:
# Smart crawl with LLM chunking (default)
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com"}`
# Skeleton only - get TOC quickly
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com", "action": "skeleton"}`
# Full content crawl
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com", "action": "crawl"}`
# Extract specific sections
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com", "chunk_indices": [0, 1, 2]}`
# Deep crawl (follow links up to depth N)
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com", "max_depth": 2}`
# Get skeleton with full content
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com", "return_skeleton": true}`
Core Concepts
| Topic | Description | Reference |
|---|---|---|
| Skeleton Planning | LLM sees TOC (~500 tokens) not full content (~10k+) | smart-chunking.md |
| Chunk Extraction | Token-aware section extraction | chunking.md |
| Deep Crawling | Multi-page crawling with BFS strategy | deep-crawl.md |
Best Practices
- Use
skeletonmode first for large documents to understand structure - Use
chunk_indicesto extract specific sections instead of full content - Set
max_depth> 0 carefully - limits pages crawled to prevent runaway crawling - Keep
fit_markdown=truefor cleaner output, false for raw content
Advanced
- Batch multiple URLs with separate calls
- Combine with knowledge tools for RAG pipelines
- Use skeleton + LLM to auto-generate chunk plans for custom extraction
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
research
paper-banana
forge-evolution
Autonomous persona evolution pipeline for failure-driven capability growth.
agenda-management
High-fidelity scheduling and cognitive alignment via the Triangular Synaptic Flow.
test
writer
Use when editing files or writing content.
Didn't find tool you were looking for?