Agent skill

text-to-speech

Convert text to speech using ElevenLabs voice AI. Use when generating audio from text, creating voiceovers, building voice apps, or synthesizing speech in 70+ languages.

View SKILL.md on GitHub Repository

Stars 153

Forks 14

Install this agent skill to your Project

npx add-skill https://github.com/elevenlabs/skills/tree/main/text-to-speech

Metadata

Additional technical details for this skill

openclaw: { "requires": { "env": [ "ELEVENLABS_API_KEY" ] }, "primaryEnv": "ELEVENLABS_API_KEY" }

SKILL.md

ElevenLabs Text-to-Speech

Generate natural speech from text - supports 70+ languages, multiple models for quality vs latency tradeoffs.

Setup: See Installation Guide. For JavaScript, use @elevenlabs/* packages only.

Quick Start

Python

python

from elevenlabs import ElevenLabs

client = ElevenLabs()

audio = client.text_to_speech.convert(
    text="Hello, welcome to ElevenLabs!",
    voice_id="JBFqnCBsd6RMkjVDRZzb",  # George
    model_id="eleven_multilingual_v2"
)

with open("output.mp3", "wb") as f:
    for chunk in audio:
        f.write(chunk)

JavaScript

javascript

import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createWriteStream } from "fs";

const client = new ElevenLabsClient();
const audio = await client.textToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
  text: "Hello, welcome to ElevenLabs!",
  modelId: "eleven_multilingual_v2",
});
audio.pipe(createWriteStream("output.mp3"));

cURL

bash

curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/JBFqnCBsd6RMkjVDRZzb" \
  -H "xi-api-key: $ELEVENLABS_API_KEY" -H "Content-Type: application/json" \
  -d '{"text": "Hello!", "model_id": "eleven_multilingual_v2"}' --output output.mp3

Models

Model ID	Languages	Latency	Best For
`eleven_v3`	70+	Standard	Highest quality, emotional range
`eleven_multilingual_v2`	29	Standard	High quality, long-form content
`eleven_flash_v2_5`	32	~75ms	Ultra-low latency, real-time
`eleven_flash_v2`	English	~75ms	English-only, fastest
`eleven_turbo_v2_5`	32	~250-300ms	Balanced quality/speed
`eleven_turbo_v2`	English	~250-300ms	English-only, balanced

Voice IDs

Use pre-made voices or create custom voices in the dashboard.

Popular voices:

JBFqnCBsd6RMkjVDRZzb - George (male, narrative)
EXAVITQu4vr4xnSDxMaL - Sarah (female, soft)
onwK4e9ZLuTAKqWW03F9 - Daniel (male, authoritative)
XB0fDUnXU5powFXDhCwa - Charlotte (female, conversational)

python

voices = client.voices.get_all()
for voice in voices.voices:
    print(f"{voice.voice_id}: {voice.name}")

Voice Settings

Fine-tune how the voice sounds:

Stability: How consistent the voice stays. Lower values = more emotional range and variation, but can sound unstable. Higher = steady, predictable delivery.
Similarity boost: How closely to match the original voice sample. Higher values sound more like the original but may amplify audio artifacts.
Style: Exaggerates the voice's unique style characteristics (only works with v2+ models).
Speaker boost: Post-processing that enhances clarity and voice similarity.

python

from elevenlabs import VoiceSettings

audio = client.text_to_speech.convert(
    text="Customize my voice settings.",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    voice_settings=VoiceSettings(
        stability=0.5,
        similarity_boost=0.75,
        style=0.5,
        speed=1.0,             # 0.25 to 4.0 (default 1.0)
        use_speaker_boost=True
    )
)

Language Enforcement

Force specific language for pronunciation:

python

audio = client.text_to_speech.convert(
    text="Bonjour, comment allez-vous?",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_multilingual_v2",
    language_code="fr"  # ISO 639-1 code
)

Text Normalization

Controls how numbers, dates, and abbreviations are converted to spoken words. For example, "01/15/2026" becomes "January fifteenth, twenty twenty-six":

"auto" (default): Model decides based on context
"on": Always normalize (use when you want natural speech)
"off": Speak literally (use when you want "zero one slash one five...")

python

audio = client.text_to_speech.convert(
    text="Call 1-800-555-0123 on 01/15/2026",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    apply_text_normalization="on"
)

Request Stitching

When generating long audio in multiple requests, the audio can have pops, unnatural pauses, or tone shifts at the boundaries. Request stitching solves this by letting each request know what comes before/after it:

python

# First request
audio1 = client.text_to_speech.convert(
    text="This is the first part.",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    next_text="And this continues the story."
)

# Second request using previous context
audio2 = client.text_to_speech.convert(
    text="And this continues the story.",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    previous_text="This is the first part."
)

Output Formats

Format	Description
`mp3_44100_128`	MP3 44.1kHz 128kbps (default) - compressed, good for web/apps
`mp3_44100_192`	MP3 44.1kHz 192kbps (Creator+) - higher quality compressed
`mp3_44100_64`	MP3 44.1kHz 64kbps - lower quality, smaller files
`mp3_22050_32`	MP3 22.05kHz 32kbps - smallest MP3 files
`pcm_16000`	Raw PCM 16kHz - use for real-time processing
`pcm_22050`	Raw PCM 22.05kHz
`pcm_24000`	Raw PCM 24kHz - good balance for streaming
`pcm_44100`	Raw PCM 44.1kHz (Pro+) - CD quality
`pcm_48000`	Raw PCM 48kHz (Pro+) - highest quality
`ulaw_8000`	μ-law 8kHz - standard for phone systems (Twilio, telephony)
`alaw_8000`	A-law 8kHz - telephony (alternative to μ-law)
`opus_48000_64`	Opus 48kHz 64kbps - efficient streaming codec
`wav_44100`	WAV 44.1kHz - uncompressed with headers

Streaming

For real-time applications, use the stream method (returns audio chunks as they're generated):

python

audio_stream = client.text_to_speech.stream(
    text="This text will be streamed as audio.",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_flash_v2_5"  # Ultra-low latency
)

for chunk in audio_stream:
    play_audio(chunk)

See references/streaming.md for WebSocket streaming.

Error Handling

python

try:
    audio = client.text_to_speech.convert(
        text="Generate speech",
        voice_id="invalid-voice-id"
    )
except Exception as e:
    print(f"API error: {e}")

Common errors:

401: Invalid API key
422: Invalid parameters (check voice_id, model_id)
429: Rate limit exceeded

Tracking Costs

Monitor character usage via response headers (x-character-count, request-id):

python

response = client.text_to_speech.convert.with_raw_response(
    text="Hello!", voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_multilingual_v2"
)
audio = response.parse()
print(f"Characters used: {response.headers.get('x-character-count')}")

References

Installation Guide
Streaming Audio
Voice Settings

Maintainer

elevenlabs Core maintainer

Source details

Full Name: elevenlabs/skills
Branch: main
Path in repo: text-to-speech
License: MIT License
Topics: ai-agents skills tts elevenlabs music sfx stt

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

elevenlabs/skills

elevenlabs-transcribe

Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.

153 14

Explore

elevenlabs/skills

agents

Build voice AI agents with ElevenLabs. Use when creating voice assistants, customer service bots, interactive voice characters, or any real-time voice conversation experience.

153 14

Explore

elevenlabs/skills

sound-effects

Generate sound effects from text descriptions using ElevenLabs. Use when creating sound effects, generating audio textures, producing ambient sounds, cinematic impacts, UI sounds, or any audio that isn't speech. Supports looping, duration control, and prompt influence tuning.

153 14

Explore

elevenlabs/skills

setup-api-key

Guides users through setting up an ElevenLabs API key for ElevenLabs MCP tools. Use when the user needs to configure an ElevenLabs API key, when ElevenLabs tools fail due to missing API key, or when the user mentions needing access to ElevenLabs. First checks whether ELEVENLABS_API_KEY is already configured and valid, and only runs full setup when needed.

153 14

Explore

elevenlabs/skills

music

Generate music using ElevenLabs Music API. Use when creating instrumental tracks, songs with lyrics, background music, jingles, or any AI-generated music composition. Supports prompt-based generation, composition plans for granular control, and detailed output with metadata.

153 14

Explore

elevenlabs/skills

speech-to-text

Transcribe audio to text using ElevenLabs Scribe v2. Use when converting audio/video to text, generating subtitles, transcribing meetings, or processing spoken content.

153 14

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

ElevenLabs Text-to-Speech

Quick Start

Python

JavaScript

cURL

Models

Voice IDs

Voice Settings

Language Enforcement

Text Normalization

Request Stitching

Output Formats

Streaming

Error Handling

Tracking Costs

References

Recommended Agent Skills

elevenlabs-transcribe

agents

sound-effects

setup-api-key

music

speech-to-text