Agent skill
youtube-transcript
Extract YouTube video transcripts with metadata and save as Markdown to Obsidian vault. Use this skill when the user requests downloading YouTube transcripts, converting YouTube videos to text, or extracting video subtitles. Does not download video/audio files, only metadata and subtitles.
Install this agent skill to your Project
npx add-skill https://github.com/glebis/claude-skills/tree/main/youtube-transcript
SKILL.md
YouTube Transcript
Overview
Extract YouTube video transcripts, metadata, and chapters using yt-dlp. Output formatted as Markdown with YAML frontmatter, saved to ~/Brains/brain/ (Obsidian vault).
Quick Start
To extract a transcript from a YouTube video:
python scripts/extract_transcript.py <youtube_url>
Optional: Specify custom output filename:
python scripts/extract_transcript.py <youtube_url> custom_filename.md
Output Format
YAML Frontmatter
The generated Markdown includes comprehensive metadata:
title- Video titlechannel- Channel nameurl- YouTube URLupload_date- Upload date (YYYY-MM-DD)duration- Video duration (HH:MM:SS)description- Video description (truncated to 500 chars)tags- Array of video tagsview_count- View countlike_count- Like count
Body Structure
Transcript organized by video chapters (if available):
## Chapter Title
**00:05:23** Transcript text for this segment.
**00:05:45** Next segment text.
If no chapters exist, all content appears under "## Transcript" heading.
Timestamps formatted as HH:MM:SS for consistency.
Workflow
- Extract metadata and subtitles using yt-dlp
- Parse VTT subtitle format to extract timestamps and text
- Group transcript segments by video chapters (if present)
- Format as Markdown with YAML frontmatter
- Save to ~/Brains/brain/ with sanitized filename based on video title
- Clean up temporary subtitle files
Deduplication
To remove duplicates from existing transcript files:
python scripts/deduplicate_transcript.py <markdown_file>
This removes transcript entries that are prefixes of subsequent entries (common in VTT files where subtitles accumulate).
Requirements
Ensure yt-dlp is installed:
pip install yt-dlp
Limitations
- Extracts subtitles in English first, falls back to Russian if English unavailable
- Requires video to have subtitles (auto-generated or manual)
- Does not download video or audio files
- Description truncated to 500 characters in frontmatter
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
tdd
This skill should be used when the user wants to implement features or fix bugs using test-driven development. Enforces the RED-GREEN-REFACTOR cycle with vertical slicing, context isolation between test writing and implementation, human checkpoints, and auto-test feedback loops. Uses multi-agent orchestration with the Task tool for architecturally enforced context isolation. Supports Jest, Vitest, pytest, Go test, cargo test, PHPUnit, and RSpec.
brand-agency
Applies Agency brand colors and typography to artifacts including presentations, SVG graphics, documents, and web interfaces. This skill should be used when brand colors, visual formatting, neobrutalism style, or Agency design standards apply. Keywords - branding, corporate identity, visual identity, styling, brand colors, typography, visual formatting, visual design, neobrutalism.
github-gist
Publish files or Obsidian notes as GitHub Gists. Use when user wants to share code/notes publicly, create quick shareable snippets, or publish markdown to GitHub. Triggers include "publish as gist", "create gist", "share on github", "make a gist from this".
chrome-history
Query Chrome browsing history with natural language. Filter by date range, article type, keywords, and specific sites.
wispr-analytics
This skill should be used when analyzing Wispr Flow voice dictation history for self-reflection, work patterns, mental health insights, or productivity analytics. Triggered by requests like "/wispr-analytics", "analyze my dictations", "what did I dictate today", "wispr reflection", or any request to review voice dictation patterns. Supports modes - technical (coding/work), soft (communication), trends (volume/frequency), mental (sentiment/energy/rumination).
granola
This skill should be used when importing, listing, or exporting Granola meeting recordings and transcripts. Queries Granola's local cache and API to list meetings, extract transcripts, and export to Obsidian notes in Fathom-compatible format.
Didn't find tool you were looking for?