Agent skill

livekit-stt-selfhosted

Build self-hosted speech-to-text APIs using Hugging Face models (Whisper, Wav2Vec2) and create LiveKit voice agent plugins. Use when building STT infrastructure, creating custom LiveKit plugins, deploying self-hosted transcription services, or integrating Whisper/HF models with LiveKit agents. Includes FastAPI server templates, LiveKit plugin implementation, model selection guides, and production deployment patterns.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/livekit-stt-selfhosted

SKILL.md

LiveKit Self-Hosted STT Plugin

Build self-hosted speech-to-text APIs and LiveKit voice agent plugins using Hugging Face models.

Overview

This skill provides templates and guidance for:

  1. Building a self-hosted STT API server using FastAPI + Whisper/HF models
  2. Creating a LiveKit plugin that connects to your self-hosted API
  3. Deploying and scaling in production

Quick Start

Option 1: Build Both (API + Plugin)

When user wants complete setup:

  1. Create API Server:
bash
python scripts/setup_api_server.py my-stt-server --model openai/whisper-medium
cd my-stt-server
pip install -r requirements.txt
python main.py
  1. Create Plugin:
bash
python scripts/setup_plugin.py custom-stt
cd livekit-plugins-custom-stt
pip install -e .
  1. Use in LiveKit Agent:
python
from livekit.plugins import custom_stt

stt=custom_stt.STT(api_url="ws://localhost:8000/ws/transcribe")

Option 2: API Server Only

When user only needs the API server:

  • Use scripts/setup_api_server.py with desired model
  • See references/api_server_guide.md for implementation details
  • Template in assets/api-server/

Option 3: Plugin Only

When user has existing API and needs LiveKit plugin:

  • Use scripts/setup_plugin.py with plugin name
  • See references/plugin_implementation.md for details
  • Template in assets/plugin-template/

Model Selection

Help user choose the right model:

Use Case Recommended Model Rationale
Best accuracy openai/whisper-large-v3 SOTA quality, requires GPU
Production balance openai/whisper-medium Good quality, reasonable speed
Real-time/fast openai/whisper-small Fast, acceptable quality
CPU-only openai/whisper-tiny Can run without GPU
English-only facebook/wav2vec2-large-960h Optimized for English

For detailed comparison and optimization tips, see references/models_comparison.md.

Implementation Workflow

Building the API Server

  1. Use the template: Start with assets/api-server/main.py

  2. Key components:

    • FastAPI app with WebSocket endpoint
    • Model loading at startup (kept in memory)
    • Audio buffer management
    • WebSocket protocol for streaming
  3. Customization points:

    • Model selection (change MODEL_ID in .env)
    • Audio processing parameters
    • Batch size and optimization
    • Error handling

For complete implementation guide, see references/api_server_guide.md.

Building the LiveKit Plugin

  1. Use the template: Start with assets/plugin-template/

  2. Required implementations:

    • _recognize_impl() - Non-streaming recognition
    • stream() - Return SpeechStream instance
    • SpeechStream class - Handle streaming
  3. Key considerations:

    • Audio format conversion (16kHz, mono, 16-bit PCM)
    • WebSocket connection management
    • Event emission (interim/final transcripts)
    • Error handling and cleanup

For complete implementation guide, see references/plugin_implementation.md.

Deployment

Development

bash
# API Server
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

# Test WebSocket
ws://localhost:8000/ws/transcribe

Production

Docker (Recommended):

bash
docker-compose up

Kubernetes: Use manifests in deployment guide

Cloud Platforms: AWS ECS, GCP Cloud Run, Azure Container Instances

For complete deployment guide including scaling, monitoring, and security, see references/deployment.md.

WebSocket Protocol

Client → Server

  • Audio: Binary (16-bit PCM, 16kHz)
  • Config: {"type": "config", "language": "en"}
  • End: {"type": "end"}

Server → Client

  • Interim: {"type": "interim", "text": "..."}
  • Final: {"type": "final", "text": "...", "language": "en"}
  • Error: {"type": "error", "message": "..."}

Common Tasks

Change Model

Edit .env:

bash
MODEL_ID=openai/whisper-small  # Faster model

Add Language Support

In plugin usage:

python
stt=custom_stt.STT(language="es")  # Spanish
stt=custom_stt.STT(detect_language=True)  # Auto-detect

Enable GPU

In API server:

bash
DEVICE=cuda:0  # Use GPU

Scale Horizontally

Deploy multiple API server instances behind load balancer. See references/deployment.md for Nginx configuration.

Troubleshooting

Out of Memory

  • Use smaller model (whisper-small or whisper-tiny)
  • Reduce batch_size in pipeline
  • Enable low_cpu_mem_usage=True

Slow Transcription

  • Ensure GPU is enabled (DEVICE=cuda:0)
  • Use FP16 precision (automatic on GPU)
  • Increase batch_size
  • Use smaller model

Connection Issues

  • Verify WebSocket support in load balancer
  • Check firewall rules
  • Increase timeout settings

Scripts

  • scripts/setup_api_server.py - Generate API server from template
  • scripts/setup_plugin.py - Generate LiveKit plugin from template

References

Load these as needed for detailed information:

  • references/api_server_guide.md - Complete API implementation guide
  • references/plugin_implementation.md - LiveKit plugin development
  • references/models_comparison.md - Model selection and optimization
  • references/deployment.md - Production deployment best practices

Assets

Ready-to-use templates:

  • assets/api-server/ - Complete FastAPI server with Whisper
  • assets/plugin-template/ - LiveKit STT plugin structure

Best Practices

  1. Keep models in memory - Load once at startup, not per request
  2. Use appropriate model size - Balance quality vs. speed for your use case
  3. Process audio in chunks - 1-second chunks work well for streaming
  4. Implement proper cleanup - Close WebSocket connections gracefully
  5. Monitor metrics - Track latency, throughput, GPU utilization
  6. Use Docker - Ensures consistent deployments
  7. Enable authentication - Secure production APIs
  8. Scale horizontally - Use load balancer for high availability

Didn't find tool you were looking for?

Be as detailed as possible for better results