Agent skill
article-exporter
Export any web article to a local Obsidian-ready Markdown directory. Fetches page content via actionbook CLI, downloads images locally, rewrites image references to relative paths, and optionally translates the article using AI. Produces a self-contained folder with README.md, images/, and an index.md navigation file.
Install this agent skill to your Project
npx add-skill https://github.com/actionbook/actionbook/tree/main/playground/article-exporter
SKILL.md
Article Exporter - Export Articles to Obsidian
Version: 0.5.0 | Last Updated: 2026-03-13
You are an expert at web content archiving and Obsidian workflow automation.
Lessons from Failed Exports
These rules were extracted from real export failures. Each one prevents a specific class of error:
- Twitter/X needs AI reformatting —
fetchreturns flat text because Twitter uses custom UI without semantic HTML. The AI reformatting step reconstructs headings, lists, and code blocks. Seereferences/twitter-handling.md. - Ask for output path first — users have different vault locations. Assuming a default creates files in the wrong place and wastes time moving them.
- Check actionbook version >= 0.9.1 — the
--wait-hintparameter was added in 0.9.1. Without it, dynamic content (SPAs, lazy-loaded pages) returns empty or partial results. - Wait after navigation — use
--wait-hint heavyfor Twitter, Medium, and other dynamic sites. Without it, the page hasn't finished rendering when content is extracted. - Rate limit batch exports — 3-5s delay between requests prevents being flagged as a bot (ToS compliance).
Quick Reference
| Task | Command | Success Criteria |
|---|---|---|
| Check deps | actionbook --version |
Shows version >= 0.9.1 |
| Fetch article | actionbook browser fetch <url> --wait-hint heavy |
Returns plain text (AI reformats to Markdown in Step 1b) |
| Translate | AI session directly | README_CN.md created |
| Open in Obsidian | obsidian-cli open "path/index.md" |
File opens in Obsidian |
Complete Export Workflow
Goal: Export web article to Obsidian directory with images and optional translation
Success criteria:
- Article directory created with README.md
- All images downloaded to images/
- index.md navigation file created
- Optional: README_CN.md translation
- Opened in Obsidian (if obsidian-cli available)
Step 1: Fetch Article Content
Execution: Direct (Bash)
# Fetch article as readability text (with log cleaning)
actionbook browser fetch "$URL" --wait-hint heavy 2>/dev/null | \
sed '/^[[:space:]]*$/d;/^\x1b\[/d;/^INFO/d' > /tmp/article_raw.txt
Success criteria:
/tmp/article_raw.txtexists and size > 0 bytes- Content contains the article's main text
The fetch command returns readability-extracted plain text (not Markdown). AI reformatting in Step 1b is always needed to produce proper Markdown.
Rules:
- Use
--wait-hint heavyfor Twitter, Medium, dynamic content - Use
--wait-hint lightfor static blogs 2>/dev/nullsuppresses stderr logssedremoves ANSI codes, INFO lines, empty lines
Twitter/X Special Handling
Twitter uses non-semantic HTML, so fetch output loses all structure (headings become flat text, code blocks disappear). If the URL contains x.com or twitter.com, pay extra attention to structure reconstruction in Step 1b. See references/twitter-handling.md.
Step 1b: AI Reformat to Markdown
Execution: Direct (AI session)
Read /tmp/article_raw.txt and convert the plain text into well-structured Markdown. Save the result to /tmp/article.md.
Reformatting rules:
- Reconstruct headings (
#,##,###) from the text structure - Preserve original image URLs as
references - Format code blocks, lists, tables, and blockquotes
- Keep the original article title as the first
# H1heading
Success criteria:
/tmp/article.mdexists and starts with# <Title>- Image URLs are preserved as Markdown image syntax
Step 2: Extract Metadata
Execution: Direct (Bash)
# Extract title (first H1 heading from AI-reformatted markdown)
TITLE=$(grep -m 1 "^# " /tmp/article.md | sed 's/^# //')
# Extract image URLs (filter out data: URLs)
IMAGE_URLS=$(grep -o '!\[[^]]*\]([^)]*)' /tmp/article.md | \
sed -E 's/!\[[^]]*\]\(([^)]*)\)/\1/' | \
grep -v '^data:')
Success criteria:
$TITLEis non-empty$IMAGE_URLScount matches expected (usewc -l)
Step 3: Ask Output Directory
Execution: [human] Human checkpoint: Confirm output location before creating files
Ask user: "Where should I save the exported article?"
Suggested paths:
~/Work/Write/Articles(default)~/Documents/Obsidian/Articles~/Notes/Imported- (or custom path from
$output_dirargument)
Success criteria: User confirms output directory
Artifacts: $OUTPUT_DIR variable set
Step 4: Create Directory Structure
Execution: Direct (Bash)
# Use argument if provided, otherwise use confirmed path
OUTPUT_DIR="${output_dir:-$USER_CONFIRMED_PATH}"
# Sanitize title for directory name
SAFE_TITLE=$(echo "$TITLE" | sed 's/[/:*?"<>|]//g' | cut -c1-100 | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
# Create output directory
ARTICLE_DIR="$OUTPUT_DIR/$SAFE_TITLE"
mkdir -p "$ARTICLE_DIR/images"
Success criteria:
- Directory
$ARTICLE_DIRexists - Subdirectory
images/exists - Directory is writable
Rules:
- Remove special characters:
/ : * ? " < > | - Limit title length to 100 characters
- Trim leading/trailing whitespace
Step 5: Download Images (Parallel if possible)
Execution: Direct (Bash)
counter=1
for url in $IMAGE_URLS; do
ext=$(echo "$url" | grep -oE '\.(jpg|jpeg|png|gif|webp|svg)' || echo ".jpg")
curl -L -s "$url" -o "$ARTICLE_DIR/images/image_${counter}${ext}"
# Check file size (detect 0-byte failures)
if [ ! -s "$ARTICLE_DIR/images/image_${counter}${ext}" ]; then
# Try alternative format (Twitter)
curl -L -s "${url}?format=jpg&name=orig" -o "$ARTICLE_DIR/images/image_${counter}.jpg"
fi
counter=$((counter + 1))
done
Success criteria:
- All image files exist and size > 0 bytes
- File count matches
$IMAGE_URLScount
Rules:
- Use
curl -Lto follow redirects - Check file size after download
- Try alternative formats for Twitter images
Step 6: Update Image References
Execution: Direct (Bash)
# Replace remote URLs with local paths
counter=1
for url in $IMAGE_URLS; do
ext=$(echo "$url" | grep -oE '\.(jpg|jpeg|png|gif|webp|svg)' || echo ".jpg")
sed -i.bak "s|$url|./images/image_${counter}${ext}|g" /tmp/article.md
counter=$((counter + 1))
done
# Save updated markdown
cp /tmp/article.md "$ARTICLE_DIR/README.md"
rm /tmp/article.md.bak
Success criteria:
README.mdcontains./images/image_N.*references- No remote URLs remain in image links
Step 7: AI Translation (Optional)
Execution: Direct (AI session)
Human checkpoint: Ask user: "Do you want to translate the article? (y/n)"
If yes:
- Read
$ARTICLE_DIR/README.md - Translate using AI capabilities (no external API)
- Write to
$ARTICLE_DIR/README_CN.md(or other language code)
Translation Prompt Template:
Translate the following Markdown article to [LANGUAGE] while preserving:
- All Markdown formatting (headings, lists, code blocks, tables)
- Image references exactly as-is: 
- Links and URLs unchanged
- Code blocks and technical terms in original language
Only output the translated Markdown content.
---
[Paste README.md content]
Success criteria: Translation file exists and size ≈ original ± 20%
Supported languages: en, zh, es, fr, de, ja, ko
Step 8: Create Navigation Index
Execution: Direct (Bash)
# Auto-detect source from URL
case "$URL" in
*x.com*|*twitter.com*) SOURCE="X" ;;
*medium.com*) SOURCE="Medium" ;;
*dev.to*) SOURCE="Dev.to" ;;
*openai.com*) SOURCE="OpenAI Blog" ;;
*substack.com*) SOURCE="Substack" ;;
*github.com*) SOURCE="GitHub" ;;
*) SOURCE=$(echo "$URL" | sed 's|https\?://||' | cut -d/ -f1) ;;
esac
# Create index.md
cat > "$ARTICLE_DIR/index.md" <<EOF
# $TITLE
> **Export Date**: $(date +%Y-%m-%d)
> **Original URL**: $URL
> **Source**: $SOURCE
## 📚 Language Versions
- 🇬🇧 **English**: [[README]]
- 🇨🇳 **Chinese**: [[README_CN]] <!-- if translated -->
## 📊 Metadata
| Property | Value |
|----------|-------|
| **Source** | $SOURCE |
| **Images** | $(ls images/ 2>/dev/null | wc -l) images |
| **Export Tool** | actionbook CLI |
| **Export Date** | $(date +%Y-%m-%d) |
---
**Exported using**: actionbook browser automation + AI assistant
EOF
Success criteria: index.md exists with metadata table
Step 9: Open in Obsidian
Execution: Direct (Bash)
if command -v obsidian-cli &> /dev/null; then
VAULT_ROOT="$OUTPUT_DIR"
REL_PATH=$(echo "$ARTICLE_DIR" | sed "s|$VAULT_ROOT/||")
obsidian-cli open "$REL_PATH/index.md"
echo "✓ Opened in Obsidian: $REL_PATH/index.md"
else
# Fallback: Open in file manager
case "$(uname)" in
Darwin) open "$ARTICLE_DIR" ;;
Linux) xdg-open "$ARTICLE_DIR" ;;
CYGWIN*|MINGW*|MSYS*) start "$ARTICLE_DIR" ;;
esac
echo "⚠️ Install obsidian-cli for automatic opening: npm install -g obsidian-cli"
fi
Success criteria:
- File opens in Obsidian OR directory opens in file manager
- User sees success message
Step 10: Report Success
Execution: Direct (Output)
echo ""
echo "════════════════════════════════════════════"
echo "✓ Article exported successfully!"
echo ""
echo "📁 Location: $ARTICLE_DIR"
echo "📄 Files:"
echo " - README.md (original)"
[ -f "$ARTICLE_DIR/README_CN.md" ] && echo " - README_CN.md (translation)"
echo " - index.md (navigation)"
echo "🖼️ Images: $(ls images/ 2>/dev/null | wc -l) files"
echo "════════════════════════════════════════════"
Common Issues
| Issue | Cause | Solution |
|---|---|---|
| "actionbook: command not found" | CLI not installed | npm install -g @actionbookdev/cli@latest |
| "unknown flag: --wait-hint" | Version < 0.9.1 | Upgrade: npm install -g @actionbookdev/cli@latest |
| Twitter format broken | fetch loses structure |
Use AI reformatting (see references/twitter-handling.md) |
| Images 0 bytes | URL expired | Try ?format=jpg&name=orig |
| obsidian-cli not found | Not installed | npm install -g obsidian-cli |
| Batch export blocked | Too fast, flagged as bot | Add 3-5s sleep between requests |
Detailed troubleshooting: See ./references/troubleshooting.md
Edge Cases Handled
- Long titles → Auto-truncate to 100 chars
- Special characters → Sanitized (
/ : * ? " < > |removed) - No images → Steps 5-6 skip gracefully
- 0-byte images → Auto-retry with alternative formats
- Data URLs → Filtered out in Step 2
When Using This Skill
- Check dependencies first —
actionbook --version >= 0.9.1 - Test with one article — Verify before batch processing
- Twitter/X requires special handling — See references/twitter-handling.md
- Respect ToS — Personal use only, rate limit batch exports
References (Progressive Disclosure)
For detailed documentation, see:
./references/twitter-handling.md— Twitter/X special handling (AI reformatting)./references/batch-export.md— Batch export with rate limiting./references/troubleshooting.md— Detailed troubleshooting guide./references/obsidian-setup.md— obsidian-cli setup and configuration./references/supported-websites.md— Complete website compatibility list
Last Updated: 2026-03-13 | Version: 0.5.0
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
actionbook
Activate when the user needs to interact with any website — browser automation, web scraping, screenshots, form filling, UI testing, monitoring, or building AI agents. Provides pre-verified page actions with step-by-step instructions and tested selectors.
deep-research
Deep research and analysis tool. Generates comprehensive HTML reports on any topic, domain, paper, or technology. Use when user asks to research, analyze, investigate, deep-dive, or generate a report on any subject. Supports academic papers (arXiv), technologies, trends, comparisons, and general topics.
actionbook-web-test
Run browser-based web tests against websites using Actionbook CLI. Activate when the user wants to test a website workflow, run smoke tests, verify a user flow, check if a web application works, run regression tests, or validate browser-based interactions. Supports test definition, execution, assertion, reporting, and json-ui visual report generation.
arxiv-viewer
View, search, and download academic papers from arXiv. Supports API queries, web scraping via Actionbook, and HTML paper reading via ar5iv. Use when user asks about arxiv papers, academic papers, research papers, paper summaries, latest papers, or wants to search/download/read papers.
actionbook-scraper
Generate and verify web scraper scripts using Actionbook's verified selectors. Auto-validates generated scripts and fixes errors.
agent-browser
Automates browser interactions for form filling and web page interaction. Used by the request-website command to submit website indexing requests.
Didn't find tool you were looking for?