Agent skill

crawl4ai

Use when crawling web pages, extracting markdown content, or scraping website data with intelligent chunking and skeleton planning. Use when the user provides a URL or link to fetch or crawl.

Stars 9
Forks 1

Install this agent skill to your Project

npx add-skill https://github.com/tao3k/xiuxian-artisan-workshop/tree/main/skills/crawl4ai

Metadata

Additional technical details for this skill

author
xiuxian-artisan-workshop
intents
[
    "Crawl a web page",
    "Extract markdown content",
    "Scrape website data",
    "Open or fetch a URL or link",
    "Research a URL or link",
    "Help me research a web page",
    "Perform deep crawl",
    "Get document skeleton/TOC",
    "Extract specific sections from web page"
]
version
0.2.1
routing keywords
[
    "crawl",
    "web",
    "scrape",
    "extract",
    "fetch",
    "url",
    "link",
    "research",
    "research url",
    "analyze page",
    "html",
    "markdown",
    "content",
    "scraper",
    "crawler",
    "web scraping",
    "web crawl",
    "page content",
    "web extraction"
]

SKILL.md

crawl4ai

High-performance web crawler with intelligent chunking. Crawls web pages and extracts content as markdown using LLM-based skeleton planning.

Commands

crawl_url (alias: webCrawl)

Crawl a web page with native workflow execution and LLM-based intelligent chunking.

Parameters:

Parameter Type Default Description
url str - Target URL to crawl (required)
action str "smart" Action mode: "smart", "skeleton", "crawl"
fit_markdown bool true Clean and simplify markdown output
max_depth int 0 Maximum crawling depth (0=single page)
return_skeleton bool false Also return document skeleton (TOC)
chunk_indices list[int] - List of section indices to extract

Action Modes:

Mode Description Use Case
smart (default) LLM generates chunk plan, then extracts relevant sections Large docs where you need specific info
skeleton Extract lightweight TOC without full content Quick overview, decide what to read
crawl Return full markdown content Small pages, complete content needed

Runtime Transport:

  • max_depth = 0: Uses HTTP strategy (no browser cold-start) for lower latency.
  • max_depth > 0: Uses browser deep-crawl strategy (BFS) for multi-page traversal.
  • file://... with max_depth = 0: Uses local fast-path (no crawl4ai runtime bootstrap) for deterministic fixture/local-note benchmarking.
  • Persistent worker mode reuses the HTTP crawler instance across requests to reduce repeated initialization cost.

Examples:

python
# Smart crawl with LLM chunking (default)
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com"}`

# Skeleton only - get TOC quickly
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com", "action": "skeleton"}`

# Full content crawl
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com", "action": "crawl"}`

# Extract specific sections
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com", "chunk_indices": [0, 1, 2]}`

# Deep crawl (follow links up to depth N)
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com", "max_depth": 2}`

# Get skeleton with full content
tool: `crawl4ai.CrawlUrl` with `{"url": "https://example.com", "return_skeleton": true}`

Core Concepts

Topic Description Reference
Skeleton Planning LLM sees TOC (~500 tokens) not full content (~10k+) smart-chunking.md
Chunk Extraction Token-aware section extraction chunking.md
Deep Crawling Multi-page crawling with BFS strategy deep-crawl.md

Best Practices

  • Use skeleton mode first for large documents to understand structure
  • Use chunk_indices to extract specific sections instead of full content
  • Set max_depth > 0 carefully - limits pages crawled to prevent runaway crawling
  • Keep fit_markdown=true for cleaner output, false for raw content

Advanced

  • Batch multiple URLs with separate calls
  • Combine with knowledge tools for RAG pipelines
  • Use skeleton + LLM to auto-generate chunk plans for custom extraction

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results