Add voice to your AI app with ElevenLabs
ElevenLabs delivers the most natural-sounding TTS available - and a streaming API that lets you ship realtime voice agents.
Prerequisites
- +ElevenLabs account and API key
- +Node 20+
- +A short clean voice sample if you want to clone
Step-by-Step
- 1
Install the SDK
The official elevenlabs package handles auth, streaming, and voice management.
pnpm add elevenlabs - 2
Pick a voice
Browse the voice library or list your voices via API. Each voice has a stable ID.
import { ElevenLabsClient } from 'elevenlabs'; const el = new ElevenLabsClient({ apiKey: process.env.ELEVENLABS_API_KEY }); const voices = await el.voices.getAll(); console.log(voices.voices.map((v) => ({ id: v.voice_id, name: v.name }))); - 3
Synthesize speech
Use the streaming endpoint and pipe to a file or directly to a client. eleven_turbo_v2_5 is the cheap fast model.
import fs from 'node:fs'; const stream = await el.textToSpeech.convertAsStream('21m00Tcm4TlvDq8ikWAM', { text: 'Welcome to the show.', model_id: 'eleven_turbo_v2_5' }); stream.pipe(fs.createWriteStream('out.mp3')); - 4
Stream to a websocket
For voice agents, use the websocket endpoint. You stream text in as the LLM produces it; audio streams out for sub-300ms latency.
const ws = el.textToSpeech.convertWithWebsocket(voiceId); ws.send({ text: 'Hello', try_trigger_generation: true }); - 5
Clone your voice
Upload 1-3 minutes of clean audio. Clones are ready in seconds.
const v = await el.voices.add({ name: 'My Voice', files: [fs.createReadStream('sample.mp3')] }); console.log(v.voice_id); - 6
Tune voice settings
stability + similarity_boost are the levers. Low stability = expressive but inconsistent. High = monotone but reliable.
Common Pitfalls
- !Synthesizing on every request blows your character budget. Cache common phrases.
- !Cloning with noisy samples produces garbage.
- !Not handling streaming backpressure can OOM long requests.
Video Clipper
AI-powered video clipping with smart moment detection. Turn long videos into shareable clips.
What's Next
- ->Pair with the Whisper transcription API for full duplex voice agents.
- ->Use Conversational AI for end-to-end voice apps.
