Consolidate Transcripts

Why? LLMs have context limits. This skill merges multiple transcripts into a single file with accurate token counting, so you can feed an entire channel's content to Claude or GPT without exceeding limits.

Quick Start

bash

python scripts/consolidate_transcripts.py <channel_name>

Output: data/<channel_name>/<channel_name>-consolidated.md

Workflow

1. Identify the Channel

List available channels:

bash

ls data/

2. Choose Token Limit

Use Case	Recommended Limit	Flag
Claude (200K context)	150000	`--limit 150000`
GPT-4 Turbo (128K)	100000	`--limit 100000`
Full archive (Claude Pro)	800000	(default)
Quick sample	50000	`--limit 50000`

[!TIP] The default 800K limit leaves ~200K tokens for prompts and responses when using Claude's 1M context.

3. Run Consolidation

bash

python scripts/consolidate_transcripts.py <channel_name> [--limit TOKENS] [--verbose]

Examples:

bash

# Default (800K tokens)
python scripts/consolidate_transcripts.py library-of-minds

# Custom limit for GPT-4
python scripts/consolidate_transcripts.py aws-reinvent-2025 --limit 100000

# Verbose output showing all included files
python scripts/consolidate_transcripts.py dwarkesh-patel --verbose

4. Verify Output

Check the consolidated file was created:

bash

ls -la data/<channel_name>/*-consolidated.md

Parameters

Option	Description	Default
`channel_name`	Folder name in `data/`	Required
`--limit, -l`	Maximum tokens to include	800000
`--verbose, -v`	Show detailed file list	False

Output Format

The consolidated file includes:

Header — Generation metadata, total transcripts, token/word counts
Table of Contents — Dates, titles, tokens, words per transcript
Transcripts — Full text with title, date, author, source URL

Troubleshooting

Problem	Cause	Solution
`ModuleNotFoundError: tiktoken`	tiktoken not installed	`pip install tiktoken`
`No transcripts found`	Empty transcripts folder	Run `transcript-download` first
`FileNotFoundError`	Channel doesn't exist	Check `ls data/` for valid names
Output file is small	Few transcripts available	Use `--verbose` to see what was included
Token count seems wrong	Old tiktoken version	`pip install --upgrade tiktoken`

Common Mistakes

Wrong channel name — Use the folder name exactly as shown in ls data/, not the YouTube channel name.
Forgetting to download transcripts first — Consolidation requires transcripts to exist. Run /download-transcripts first.
Using too high a limit — If you exceed your LLM's context, you'll get truncation errors. Use the limit guide above.
Expecting real-time updates — Re-run consolidation after downloading new transcripts.

Reference

Transcripts sorted newest first (descending by date)
Files without dates in filename are placed last
Token counting uses cl100k_base encoding (GPT-4/Claude compatible)
Consolidated files are gitignored (not committed)
Re-running overwrites the previous consolidated file

Search AI Tools

consolidate-transcripts

Install this agent skill to your Project

SKILL.md

Consolidate Transcripts

Quick Start

Workflow

1. Identify the Channel

2. Choose Token Limit

3. Run Consolidation

4. Verify Output

Parameters

Output Format

Troubleshooting

Common Mistakes

Reference