Agent skill
content-parser
Extract and parse content from URLs. Triggers on: user provides a URL to extract content from, another skill needs to parse source material, "parse this URL", "extract content", "解析链接", "提取内容".
Install this agent skill to your Project
npx add-skill https://github.com/marswaveai/skills/tree/main/content-parser
Metadata
Additional technical details for this skill
- openclaw
-
{ "emoji": "\ud83d\udd17", "requires": { "env": [ "LISTENHUB_API_KEY" ] }, "primaryEnv": "LISTENHUB_API_KEY" }
SKILL.md
When to Use
- User provides a URL and wants to extract/read its content
- Another skill needs to parse source material from a URL before generation
- User says "parse this URL", "extract content from this link"
- User says "解析链接", "提取内容"
When NOT to Use
- User already has text content and doesn't need URL parsing
- User wants to generate audio/video content (not content extraction)
- User wants to read a local file (use standard file reading tools)
Purpose
Extract and normalize content from URLs across supported platforms. Returns structured data including content body, metadata, and references. Useful as a preprocessing step for content generation skills or standalone content extraction.
Hard Constraints
- No shell scripts. Construct curl commands from the API reference files listed in Resources
- Always read
shared/authentication.mdfor API key and headers - Follow
shared/common-patterns.mdfor polling, errors, and interaction patterns - URL must be a valid HTTP(S) URL
- Always read config following
shared/config-pattern.mdbefore any interaction - Never save files to
~/Downloads/or.listenhub/— save to the current working directory
Step -1: API Key Check
Follow shared/config-pattern.md § API Key Check. If the key is missing, stop immediately.
Step 0: Config Setup
Follow shared/config-pattern.md Step 0 (Zero-Question Boot).
If file doesn't exist — silently create with defaults and proceed:
mkdir -p ".listenhub/content-parser"
echo '{"autoDownload":true}' > ".listenhub/content-parser/config.json"
CONFIG_PATH=".listenhub/content-parser/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Do NOT ask any setup questions. Proceed directly to the Interaction Flow.
If file exists — read config silently and proceed:
CONFIG_PATH=".listenhub/content-parser/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/content-parser/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Setup Flow (user-initiated reconfigure only)
Only run when the user explicitly asks to reconfigure. Display current settings:
当前配置 (content-parser):
自动下载:{是 / 否}
Then ask:
- autoDownload: "自动保存提取的内容到当前目录?"
- "是(推荐)" →
autoDownload: true - "否" →
autoDownload: false
- "是(推荐)" →
Save immediately:
NEW_CONFIG=$(echo "$CONFIG" | jq --argjson dl {true/false} '. + {"autoDownload": $dl}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
Interaction Flow
Step 1: URL Input
Free text input. Ask the user:
What URL would you like to extract content from?
Step 2: Options (optional)
Ask if the user wants to configure extraction options:
Question: "Do you want to configure extraction options?"
Options:
- "No, use defaults" — Extract with default settings
- "Yes, configure options" — Set summarize, maxLength, or Twitter tweet count
If "Yes", ask follow-up questions:
- Summarize: "Generate a summary of the content?" (Yes/No)
- Max Length: "Set maximum content length?" (Free text, e.g., "5000")
- Twitter count (only if URL is Twitter/X profile): "How many tweets to fetch?" (1-100, default 20)
Step 3: Confirm & Extract
Summarize:
Ready to extract content:
URL: {url}
Options: {summarize: true, maxLength: 5000, twitter.count: 50} / default
Proceed?
Wait for explicit confirmation before calling the API.
Workflow
-
Validate URL: Must be HTTP(S). Normalize if needed (see
references/supported-platforms.md) -
Build request body:
json{ "source": { "type": "url", "uri": "{url}" }, "options": { "summarize": true/false, "maxLength": 5000, "twitter": { "count": 50 } } }Omit
optionsif user chose defaults. -
Submit (foreground):
POST /v1/content/extract→ extracttaskId -
Tell the user extraction is in progress
-
Poll (background): Run the following exact bash command with
run_in_background: trueandtimeout: 300000. Note: status field is.data.status(notprocessStatus), interval is 5s, values areprocessing/completed/failed:bashTASK_ID="<id-from-step-3>" for i in $(seq 1 60); do RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID" \ -H "Authorization: Bearer $LISTENHUB_API_KEY" \ -H "X-Source: skills" 2>/dev/null) STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.status // "processing"') case "$STATUS" in completed) echo "$RESULT"; exit 0 ;; failed) echo "FAILED: $RESULT" >&2; exit 1 ;; *) sleep 5 ;; esac done echo "TIMEOUT" >&2; exit 2 -
When notified, download and present result:
If
autoDownloadistrue, generate a slug from the extracted title (falling back to domain name if no title). Followshared/config-pattern.md§ Artifact Naming for slug generation and dedup.- Write
{slug}.mdto the current directory — full extracted content in markdown - Write
{slug}.jsonto the current directory — full raw API response data
bashSLUG="{title-slug}" # e.g. "topology-wikipedia" # Dedup: check if files exist BASE="$SLUG"; i=2 while [ -e "${SLUG}.md" ] || [ -e "${SLUG}.json" ]; do SLUG="${BASE}-${i}"; i=$((i+1)); done echo "$CONTENT_MD" > "${SLUG}.md" echo "$RESULT" > "${SLUG}.json"Present:
内容提取完成! 来源:{url} 标题:{metadata.title} 长度:~{character count} 字符 消耗积分:{credits} 已保存到当前目录: {slug}.md {slug}.json - Write
-
Show a preview of the extracted content (first ~500 chars)
-
Offer to use content in another skill (e.g.
/podcast,/tts)
Estimated time: 10-30 seconds depending on content size and platform.
API Reference
- Content extract:
shared/api-content-extract.md - Supported platforms:
references/supported-platforms.md - Polling:
shared/common-patterns.md§ Async Polling - Error handling:
shared/common-patterns.md§ Error Handling - Config pattern:
shared/config-pattern.md
Example
User: "Parse this article: https://en.wikipedia.org/wiki/Topology"
Agent workflow:
- URL:
https://en.wikipedia.org/wiki/Topology - Options: defaults (omit options)
- Submit extraction
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{
"source": {
"type": "url",
"uri": "https://en.wikipedia.org/wiki/Topology"
}
}'
- Poll until complete:
curl -sS "https://api.marswave.ai/openapi/v1/content/extract/69a7dac700cf95938f86d9bb" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "X-Source: skills"
- Present extracted content preview and offer next actions.
User: "Extract recent tweets from @elonmusk, get 50 tweets"
Agent workflow:
- URL:
https://x.com/elonmusk - Options:
{"twitter": {"count": 50}} - Submit extraction
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{
"source": {
"type": "url",
"uri": "https://x.com/elonmusk"
},
"options": {
"twitter": {
"count": 50
}
}
}'
- Poll until complete, present results.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
tts
Text-to-speech and voice narration. Triggers on: "朗读这段", "配音", "TTS", "语音合成", "text to speech", "read this aloud", "convert to speech", "voice narration", "read aloud".
explainer
Create explainer videos with narration and AI-generated visuals. Triggers on: "解说视频", "explainer video", "explain this as a video", "tutorial video", "introduce X (video)", "解释一下XX(视频形式)".
image-gen
Generate AI images from text prompts. Triggers on: "生成图片", "画一张", "AI图", "generate image", "配图", "create picture", "draw", "visualize", "generate an image".
podcast
Create podcasts from topics, URLs, or text. Triggers on: "做播客", "podcast", "播客", "录一期节目", "chat about", "discuss", "debate", "dialogue", "make a podcast about".
creator
Creator workflow — generate platform-ready content packages. Triggers on: "创作", "写公众号", "小红书", "口播", "creator", "content workflow", "帮我写一篇", "生成内容", "write an article", "create content".
listenhub
DEPRECATED — replaced by individual skills. Use when the user triggers any ListenHub action: "make a podcast", "explainer video", "read aloud", "TTS", "generate image", "解说视频", "播客", "朗读", "生成图片".
Didn't find tool you were looking for?