Agent skill
scrapling
CLI-first web scraping & content extraction with optional MCP server. Use when you have target URLs and need clean, selector-based outputs (html/md/txt).
Install this agent skill to your Project
npx add-skill https://github.com/foryourhealth111-pixel/Vibe-Skills/tree/main/bundled/skills/scrapling
SKILL.md
Scrapling Skill (VCO)
Scrapling is a Python-based web scraping / extraction toolkit that exposes:
- a CLI (
scrapling ...) for fetching + extracting content into files - an optional MCP server (
scrapling mcp) so an agent can call structured scraping tools
This skill is CLI-first. Prefer it when you already have URLs and need reliable, repeatable extraction (CSS selector → file).
When to use
Use scrapling when you need:
- Extract specific parts of a web page (CSS selector / XPath) into
.txt/.md/.html - Run repeatable scraping jobs (batch URLs with a small wrapper script)
- Reduce token usage by extracting only the relevant DOM region before passing to the LLM
- Provide a local MCP endpoint for scraping tools (agent → MCP → scrapling)
Boundaries (vs Playwright / Search)
vs playwright
scrapling: best for “get URL → extract selector → write file” workflows; simpler, faster iterationplaywright: best for interactive UI flows (login, multi-step navigation, downloads, complex JS actions, stateful sessions)
If you must navigate or click through a UI, use playwright.
If you can directly fetch the target page and just need extraction, use scrapling.
vs search tools
- Search tools are for discovering sources/URLs (query → result list → choose URLs).
scraplingis for acquisition + extraction once you already know the URL(s).
A common pipeline:
- Search → find candidate URLs
- Scrapling → extract focused content from chosen URLs
- LLM → summarize / transform / analyze extracted outputs
Prerequisite check (required)
- Python version (Scrapling requires Python >= 3.10):
python --version
- Scrapling CLI availability:
scrapling --help
Installation (recommended)
Scrapling’s CLI and MCP features are enabled via extras.
Recommended (CLI + MCP + fetchers):
python -m pip install "scrapling[ai]"
If you only want CLI fetch/extract without MCP:
python -m pip install "scrapling[fetchers]"
If you use browser-based fetchers, you may need browser binaries:
# Option A: via Scrapling helper (after install)
scrapling install
# Option B: directly via Playwright
python -m playwright install
Wrapper script (Windows convenience)
This skill ships a thin PowerShell wrapper:
C:/Users/羽裳/.codex/skills/scrapling/scripts/scrapling.ps1
It checks whether scrapling exists and prints install hints if missing.
Common CLI patterns
1) Extract full page body (to Markdown)
scrapling extract get "https://example.com" out.md
2) Extract a specific element (CSS selector) to text
scrapling extract get "https://example.com" out.txt --css-selector "main article"
3) Extract HTML for downstream parsing
scrapling extract get "https://example.com" out.html --css-selector "#content"
4) Use browser-backed fetcher mode (when simple GET is blocked / dynamic)
scrapling extract fetch "https://example.com" out.md --css-selector "main"
Tip: keep outputs in files and only feed the smallest relevant snippet to the LLM.
MCP server relationship (optional)
Scrapling can run as an MCP server. This is useful when:
- the agent needs tool-style scraping calls
- you want scraping results to be structured and deterministic
Start MCP server (stdio transport by default):
scrapling mcp
Optional: run MCP server with HTTP transport:
scrapling mcp --http --host 127.0.0.1 --port 8765
Example MCP server config snippet
{
"servers": {
"scrapling": {
"mode": "stdio",
"command": "scrapling",
"args": ["mcp"],
"required": false,
"note": "Requires: python -m pip install \"scrapling[ai]\""
}
}
}
Safety & ops notes
- Prefer selector-based extraction to minimize data volume.
- Treat scraping as an external dependency: handle timeouts, retries, and failures explicitly.
- For aggressive bot protection, consider switching fetchers or using
playwright.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
pufferlib
This skill should be used when working with reinforcement learning tasks including high-performance RL training, custom environment development, vectorized parallel simulation, multi-agent systems, or integration with existing RL environments (Gymnasium, PettingZoo, Atari, Procgen, etc.). Use this skill for implementing PPO training, creating PufferEnv environments, optimizing RL performance, or developing policies with CNNs/LSTMs.
fluidsim
Framework for computational fluid dynamics simulations using Python. Use when running fluid dynamics simulations including Navier-Stokes equations (2D/3D), shallow water equations, stratified flows, or when analyzing turbulence, vortex dynamics, or geophysical flows. Provides pseudospectral methods with FFT, HPC support, and comprehensive output analysis.
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
build-error-resolver
Compatibility alias for build-specific error resolution. Use this when VCO routes to build-error-resolver but the upstream agent is unavailable in the current runtime.
geniml
This skill should be used when working with genomic interval data (BED files) for machine learning tasks. Use for training region embeddings (Region2Vec, BEDspace), single-cell ATAC-seq analysis (scEmbed), building consensus peaks (universes), or any ML-based analysis of genomic regions. Applies to BED file collections, scATAC-seq data, chromatin accessibility datasets, and region-based genomic feature learning.
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
Didn't find tool you were looking for?