Agent skill

scrapling

CLI-first web scraping & content extraction with optional MCP server. Use when you have target URLs and need clean, selector-based outputs (html/md/txt).

View SKILL.md on GitHub Repository

Stars 1,415

Forks 109

Install this agent skill to your Project

npx add-skill https://github.com/foryourhealth111-pixel/Vibe-Skills/tree/main/bundled/skills/scrapling

SKILL.md

Scrapling Skill (VCO)

Scrapling is a Python-based web scraping / extraction toolkit that exposes:

a CLI (scrapling ...) for fetching + extracting content into files
an optional MCP server (scrapling mcp) so an agent can call structured scraping tools

This skill is CLI-first. Prefer it when you already have URLs and need reliable, repeatable extraction (CSS selector → file).

When to use

Use scrapling when you need:

Extract specific parts of a web page (CSS selector / XPath) into .txt / .md / .html
Run repeatable scraping jobs (batch URLs with a small wrapper script)
Reduce token usage by extracting only the relevant DOM region before passing to the LLM
Provide a local MCP endpoint for scraping tools (agent → MCP → scrapling)

Boundaries (vs Playwright / Search)

vs `playwright`

scrapling: best for “get URL → extract selector → write file” workflows; simpler, faster iteration
playwright: best for interactive UI flows (login, multi-step navigation, downloads, complex JS actions, stateful sessions)

If you must navigate or click through a UI, use playwright. If you can directly fetch the target page and just need extraction, use scrapling.

vs search tools

Search tools are for discovering sources/URLs (query → result list → choose URLs).
scrapling is for acquisition + extraction once you already know the URL(s).

A common pipeline:

Search → find candidate URLs
Scrapling → extract focused content from chosen URLs
LLM → summarize / transform / analyze extracted outputs

Prerequisite check (required)

Python version (Scrapling requires Python >= 3.10):

powershell

python --version

Scrapling CLI availability:

powershell

scrapling --help

Installation (recommended)

Scrapling’s CLI and MCP features are enabled via extras.

Recommended (CLI + MCP + fetchers):

powershell

python -m pip install "scrapling[ai]"

If you only want CLI fetch/extract without MCP:

powershell

python -m pip install "scrapling[fetchers]"

If you use browser-based fetchers, you may need browser binaries:

powershell

# Option A: via Scrapling helper (after install)
scrapling install

# Option B: directly via Playwright
python -m playwright install

Wrapper script (Windows convenience)

This skill ships a thin PowerShell wrapper:

C:/Users/羽裳/.codex/skills/scrapling/scripts/scrapling.ps1

It checks whether scrapling exists and prints install hints if missing.

Common CLI patterns

1) Extract full page body (to Markdown)

powershell

scrapling extract get "https://example.com" out.md

2) Extract a specific element (CSS selector) to text

powershell

scrapling extract get "https://example.com" out.txt --css-selector "main article"

3) Extract HTML for downstream parsing

powershell

scrapling extract get "https://example.com" out.html --css-selector "#content"

4) Use browser-backed fetcher mode (when simple GET is blocked / dynamic)

powershell

scrapling extract fetch "https://example.com" out.md --css-selector "main"

Tip: keep outputs in files and only feed the smallest relevant snippet to the LLM.

MCP server relationship (optional)

Scrapling can run as an MCP server. This is useful when:

the agent needs tool-style scraping calls
you want scraping results to be structured and deterministic

Start MCP server (stdio transport by default):

powershell

scrapling mcp

Optional: run MCP server with HTTP transport:

powershell

scrapling mcp --http --host 127.0.0.1 --port 8765

Example MCP server config snippet

json

{
  "servers": {
    "scrapling": {
      "mode": "stdio",
      "command": "scrapling",
      "args": ["mcp"],
      "required": false,
      "note": "Requires: python -m pip install \"scrapling[ai]\""
    }
  }
}

Safety & ops notes

Prefer selector-based extraction to minimize data volume.
Treat scraping as an external dependency: handle timeouts, retries, and failures explicitly.
For aggressive bot protection, consider switching fetchers or using playwright.

Maintainer

foryourhealth111-pixel Core maintainer

Source details

Full Name: foryourhealth111-pixel/Vibe-Skills
Branch: main
Path in repo: bundled/skills/scrapling
License: Apache License 2.0
Topics: claude-code anthropic claude agent-skills automation mcp ai-agents cursor developer-tools agentic-coding skills llm codex claude-skills vibe-coding vibecoding opencode ai-skills ai-workflow windsurf

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

foryourhealth111-pixel/Vibe-Skills

pufferlib

This skill should be used when working with reinforcement learning tasks including high-performance RL training, custom environment development, vectorized parallel simulation, multi-agent systems, or integration with existing RL environments (Gymnasium, PettingZoo, Atari, Procgen, etc.). Use this skill for implementing PPO training, creating PufferEnv environments, optimizing RL performance, or developing policies with CNNs/LSTMs.

1,415 109

Explore

foryourhealth111-pixel/Vibe-Skills

fluidsim

Framework for computational fluid dynamics simulations using Python. Use when running fluid dynamics simulations including Navier-Stokes equations (2D/3D), shallow water equations, stratified flows, or when analyzing turbulence, vortex dynamics, or geophysical flows. Provides pseudospectral methods with FFT, HPC support, and comprehensive output analysis.

1,415 109

Explore

foryourhealth111-pixel/Vibe-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

1,415 109

Explore

foryourhealth111-pixel/Vibe-Skills

build-error-resolver

Compatibility alias for build-specific error resolution. Use this when VCO routes to build-error-resolver but the upstream agent is unavailable in the current runtime.

1,415 109

Explore

foryourhealth111-pixel/Vibe-Skills

geniml

This skill should be used when working with genomic interval data (BED files) for machine learning tasks. Use for training region embeddings (Region2Vec, BEDspace), single-cell ATAC-seq analysis (scEmbed), building consensus peaks (universes), or any ML-based analysis of genomic regions. Applies to BED file collections, scATAC-seq data, chromatin accessibility datasets, and region-based genomic feature learning.

1,415 109

Explore

foryourhealth111-pixel/Vibe-Skills

zinc-database

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

1,415 109

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Scrapling Skill (VCO)

When to use

Boundaries (vs Playwright / Search)

vs playwright

vs search tools

Prerequisite check (required)

Installation (recommended)

Wrapper script (Windows convenience)

Common CLI patterns

1) Extract full page body (to Markdown)

2) Extract a specific element (CSS selector) to text

3) Extract HTML for downstream parsing

4) Use browser-backed fetcher mode (when simple GET is blocked / dynamic)

MCP server relationship (optional)

Example MCP server config snippet

Safety & ops notes

Recommended Agent Skills

pufferlib

fluidsim

metabolomics-workbench-database

build-error-resolver

geniml

zinc-database

vs `playwright`