Agent skill
subtitle-to-document
Convert subtitle files (WebVTT .vtt and SubRip .srt) into clean, readable text documents with multi-language support (English, Traditional Chinese, Simplified Chinese). Auto-detects format and language, removes timestamps and formatting, merges captions into natural paragraphs, strips annotations, and fixes spacing appropriately for each language. Use when the user provides a subtitle file or asks to convert captions to text.
Install this agent skill to your Project
npx add-skill https://github.com/blue119/yp-skills/tree/main/subtitle-to-document
SKILL.md
Subtitles to Document
Convert WebVTT (.vtt) and SubRip (.srt) subtitle files into clean, readable text documents with natural paragraph flow and multi-language support.
Supported Languages
- English (en) - Automatic capitalization, space-based word separation
- Traditional Chinese (zh_tw / \u7e41\u9ad4\u4e2d\u6587) - Chinese punctuation detection, no spaces between characters
- Simplified Chinese (zh_cn / \u7b80\u4f53\u4e2d\u6587) - Chinese punctuation detection, no spaces between characters
The script auto-detects the subtitle format (VTT or SRT) and the language by default.
Quick Start
Auto-detect Format and Language (Default)
When the user provides a subtitle file path, run the conversion script:
python scripts/subtitle_to_text.py input.vtt
# or
python scripts/subtitle_to_text.py input.srt
This creates input.txt in the same directory with cleaned, formatted text.
Specify Language or Format
# English
python scripts/subtitle_to_text.py input.srt --lang en
# Traditional Chinese
python scripts/subtitle_to_text.py input.vtt --lang zh_tw
# Force SRT parsing
python scripts/subtitle_to_text.py input.srt --format srt
Specify Custom Output Path
python scripts/subtitle_to_text.py input.srt output.txt --lang zh_tw
What the Script Does
The subtitle_to_text.py script automatically performs format- and language-aware processing:
All Formats
- Removes timestamps - Strips VTT/SRT timing information (e.g.,
00:00:00.000 --> 00:00:02.500or00:00:00,000 --> 00:00:02,500) - Removes subtitle formatting - Strips tags like
<v Speaker>,<c>,<i>, etc. - Removes duplicates - Detects and removes consecutive duplicate captions
Language-Specific Processing
English (en)
- Sentence detection:
.,!,? - Annotations removed:
[Music],[Applause],[Laughter],[Inaudible] - Capitalization: Auto-capitalizes sentences and fixes
i->I - Spacing: Adds spaces between words, fixes punctuation spacing
Traditional Chinese (zh_tw)
- Sentence detection:
\u3002,\uff01,\uff1f - Annotations removed:
[...](Chinese and English annotations) - Capitalization: Not applicable
- Spacing: Removes spaces between Chinese characters, preserves spacing around embedded English words
Simplified Chinese (zh_cn)
- Same behavior as Traditional Chinese, using simplified character indicators for detection
Format Detection
Format detection uses file extension (.vtt / .srt) and content heuristics (WEBVTT header, presence of --> timestamps). You can override detection with --format.
If auto-detection is incorrect, override with --format srt or --format vtt, or set --lang to force language.
Workflow
When a user requests subtitle conversion:
- Confirm the input subtitle file path exists
- Determine the format (auto-detect or ask user to override)
- Determine language (auto-detect or ask user to override)
- Run the script with appropriate flags if needed
- Display a preview of the converted text
- Confirm the output file location
Examples
Example 1: SRT (English)
Input SRT:
1
00:00:00,000 --> 00:00:02,500
Hello and welcome to this tutorial.
2
00:00:02,500 --> 00:00:05,000
Today we're going to learn
3
00:00:05,000 --> 00:00:07,500
about video text tracks.
Output Text:
Hello and welcome to this tutorial.
Today we're going to learn about video text tracks.
Example 2: VTT (Traditional Chinese)
Use --lang zh_tw or allow auto-detection.
Using from Python
The script can also be imported as a module:
from scripts.subtitle_to_text import convert_subtitles_to_text
# Auto-detect format & language
text = convert_subtitles_to_text('input.srt', 'output.txt')
# Specify language or format
text = convert_subtitles_to_text('input.vtt', 'output.txt', lang='zh_tw', fmt='vtt')
# Backwards-compatible alias
from scripts.subtitle_to_text import convert_vtt_to_text
text = convert_vtt_to_text('input.vtt', 'output.txt')
Troubleshooting
Wrong format detected: Use --format srt or --format vtt to override:
python scripts/subtitle_to_text.py input.srt --format srt
Wrong language detected: Use --lang to override:
python scripts/subtitle_to_text.py input.srt --lang zh_tw
Encoding errors: The script uses UTF-8 encoding by default. If the subtitle file uses a different encoding, convert it to UTF-8 first.
Missing paragraphs: The script merges based on language-specific sentence-ending punctuation. If captions lack proper punctuation they may merge into larger paragraphs.
Mixed language content: The script works best with single-language files. For mixed English/Chinese content consider processing separately.
CLI Reference
Usage: python subtitle_to_text.py <input.vtt|input.srt> [output.txt] [--lang en|zh_tw|zh_cn] [--format auto|vtt|srt]
Language options:
--lang auto Auto-detect language (default)
--lang en English
--lang zh_tw Traditional Chinese (\u7e41\u9ad4\u4e2d\u6587)
--lang zh_cn Simplified Chinese (\u7b80\u4f53\u4e2d\u6587)
Format options:
--format auto Auto-detect subtitle format (default)
--format vtt Force VTT parsing
--format srt Force SRT parsing
If output path is not specified, a .txt file will be created next to input.
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
internal-doc-polisher
Transform raw or transcript-like text into a polished Markdown document for internal sharing. Use when the user provides a text file (any mix of zh_tw, zh_cn, or en) and wants sentence repair, structured headings, concise paragraphs, a 3–7 bullet summary, and an Action Items section when tasks are mentioned.
Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
edit-article
Edit and improve articles by restructuring sections, improving clarity, and tightening prose. Use when user wants to edit, revise, or improve an article draft.
handoff
Compact the current conversation into a handoff document for another agent to pick up.
obsidian-vault
Search, create, and manage notes in the Obsidian vault with wikilinks and index notes. Use when user wants to find, create, or organize notes in Obsidian.
setup-pre-commit
Set up Husky pre-commit hooks with lint-staged (Prettier), type checking, and tests in the current repo. Use when user wants to add pre-commit hooks, set up Husky, configure lint-staged, or add commit-time formatting/typechecking/testing.
Didn't find tool you were looking for?