Agent skill

transcribe

Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.

View SKILL.md on GitHub Repository

Stars 1,415

Forks 109

Install this agent skill to your Project

npx add-skill https://github.com/foryourhealth111-pixel/Vibe-Skills/tree/main/bundled/skills/transcribe

SKILL.md

Audio Transcribe

Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.

Workflow

Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
Verify OPENAI_API_KEY is set. If missing, ask the user to set it locally (do not ask them to paste the key).
Run the bundled transcribe_diarize.py CLI with sensible defaults (fast text transcription).
Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
Save outputs under output/transcribe/ when working in this repo.

Decision rules

Default to gpt-4o-mini-transcribe with --response-format text for fast transcription.
If the user wants speaker labels or diarization, use --model gpt-4o-transcribe-diarize --response-format diarized_json.
If audio is longer than ~30 seconds, keep --chunking-strategy auto.
Prompting is not supported for gpt-4o-transcribe-diarize.

Output conventions

Use output/transcribe/<job-id>/ for evaluation runs.
Use --out-dir for multiple files to avoid overwriting.

Dependencies (install if missing)

Prefer uv for dependency management.

uv pip install openai

If uv is unavailable:

python3 -m pip install openai

Environment

OPENAI_API_KEY must be set for live API calls.
If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
Never ask the user to paste the full key in chat.

Skill path (set once)

bash

export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"

User-scoped skills install under $CODEX_HOME/skills (default: ~/.codex/skills).

CLI quick start

Single file (fast text default):

python3 "$TRANSCRIBE_CLI" \
  path/to/audio.wav \
  --out transcript.txt

Diarization with known speakers (up to 4):

python3 "$TRANSCRIBE_CLI" \
  meeting.m4a \
  --model gpt-4o-transcribe-diarize \
  --known-speaker "Alice=refs/alice.wav" \
  --known-speaker "Bob=refs/bob.wav" \
  --response-format diarized_json \
  --out-dir output/transcribe/meeting

Plain text output (explicit):

python3 "$TRANSCRIBE_CLI" \
  interview.mp3 \
  --response-format text \
  --out interview.txt

Reference map

references/api.md: supported formats, limits, response formats, and known-speaker notes.

Maintainer

foryourhealth111-pixel Core maintainer

Source details

Full Name: foryourhealth111-pixel/Vibe-Skills
Branch: main
Path in repo: bundled/skills/transcribe
License: Apache License 2.0
Topics: claude-code anthropic claude agent-skills automation mcp ai-agents cursor developer-tools agentic-coding skills llm codex claude-skills vibe-coding vibecoding opencode ai-skills ai-workflow windsurf

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

foryourhealth111-pixel/Vibe-Skills

pufferlib

This skill should be used when working with reinforcement learning tasks including high-performance RL training, custom environment development, vectorized parallel simulation, multi-agent systems, or integration with existing RL environments (Gymnasium, PettingZoo, Atari, Procgen, etc.). Use this skill for implementing PPO training, creating PufferEnv environments, optimizing RL performance, or developing policies with CNNs/LSTMs.

1,415 109

Explore

foryourhealth111-pixel/Vibe-Skills

fluidsim

Framework for computational fluid dynamics simulations using Python. Use when running fluid dynamics simulations including Navier-Stokes equations (2D/3D), shallow water equations, stratified flows, or when analyzing turbulence, vortex dynamics, or geophysical flows. Provides pseudospectral methods with FFT, HPC support, and comprehensive output analysis.

1,415 109

Explore

foryourhealth111-pixel/Vibe-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

1,415 109

Explore

foryourhealth111-pixel/Vibe-Skills

build-error-resolver

Compatibility alias for build-specific error resolution. Use this when VCO routes to build-error-resolver but the upstream agent is unavailable in the current runtime.

1,415 109

Explore

foryourhealth111-pixel/Vibe-Skills

geniml

This skill should be used when working with genomic interval data (BED files) for machine learning tasks. Use for training region embeddings (Region2Vec, BEDspace), single-cell ATAC-seq analysis (scEmbed), building consensus peaks (universes), or any ML-based analysis of genomic regions. Applies to BED file collections, scATAC-seq data, chromatin accessibility datasets, and region-based genomic feature learning.

1,415 109

Explore

foryourhealth111-pixel/Vibe-Skills

zinc-database

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

1,415 109

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Audio Transcribe

Workflow

Decision rules

Output conventions

Dependencies (install if missing)

Environment

Skill path (set once)

CLI quick start

Reference map

Recommended Agent Skills

pufferlib

fluidsim

metabolomics-workbench-database

build-error-resolver

geniml

zinc-database