ElevenLabs API Complete Developer Guide (2026)

(Text-to-Speech, Streaming, Python, Pricing, API Key, URL, and Eleven v3)

If you’ve searched for “ElevenLabs API”, chances are you want one of three outcomes:

  • Generate lifelike speech from text (the core “ElevenLabs API text to speech” use case).
  • Ship it in an app (low latency streaming, production reliability, scalable architecture).
  • Understand cost + setup (pricing, “free” options, how to get an API key, docs, SDKs).
Elevenlabs api free ElevenLabs API key ElevenLabs API pricing ElevenLabs API documentation ElevenLabs API Python ElevenLabs API text to speech ElevenLabs API url ElevenLabs API v3
How to use this page
You can treat this as an end-to-end reference: what to build, where the docs are, how authentication works, what endpoints matter, how to stream audio in real-time, how to estimate monthly costs, and how to design a future proof audio pipeline.

1) What is the ElevenLabs API?

ElevenLabs API is a set of HTTP endpoints (and official SDKs) that let you build AI audio features such as:

  • Text-to-Speech (TTS): convert text into natural, expressive speech
  • Streaming audio: receive audio bytes as they’re generated (low-latency playback)
  • WebSocket TTS: stream partial text input and get audio back in real time
  • Speech-to-Text / transcription: depending on the product area you use
  • Voice features: voices library, voice management, settings, etc.
  • Agents / conversational experiences: for interactive voice agents
Docs coverage note
ElevenLabs provides official docs and API reference pages that cover authentication, endpoints, models, and streaming patterns.

2) ElevenLabs API v3: what it means in practice

In 2026, you’ll see “Eleven v3” referenced as one of ElevenLabs’ flagship voice models—positioned as highly expressive and capable of dramatic delivery, with broad language support.

The docs changelog states that Eleven v3 is available via the API, and you can use it by specifying the model ID eleven_v3 when making Text-to-Speech requests.

Key developer implication
You don’t “use a separate API” for v3—you typically use the same TTS endpoint and select the model via a parameter (model_id), depending on your request and SDK.

3) ElevenLabs API documentation: where to look (and what matters)

When people say “ElevenLabs API documentation”, they usually need these doc areas:

A) Authentication (API keys, security, headers)
  • How to pass the API key (the header name matters)
  • Key permissions + scoping
  • Avoiding client-side exposure
The official docs show that requests must include your key in the xi-api-key header.
B) Core endpoints (TTS, streaming, voices)
  • The TTS “convert text to speech” endpoint
  • Streaming patterns (HTTP chunked streaming)
  • WebSocket streaming (input streaming)
The docs explicitly list the Create speech endpoint as:
POST https://api.elevenlabs.io/v1/text-to-speech/:voice_id
C) Models + capabilities
  • Which model to use for your use case (quality vs latency)
  • Model IDs and limits
ElevenLabs lists major TTS models and characteristics (e.g., Eleven v3, Multilingual v2, Flash v2.5, Turbo v2.5).
D) Pricing + usage
  • Plans, included minutes/credits, and overages
  • Startup grants / free programs (if applicable)
  • Estimating cost per minute or per month
Their API pricing page includes plan comparisons and mentions a Startup Grants Program offering “12 months free” and “33M Characters” for a limited period (grant-based).

4) ElevenLabs API URL: base URL + common endpoints

If you’re searching for “ElevenLabs API url”, here are the most important pieces:

Base API URL
https://api.elevenlabs.io
This is shown across the documentation and endpoint examples (including TTS convert).
Common REST endpoints you’ll use a lot
  • List models: GET /v1/models (used in auth docs’ curl example)
  • Text-to-Speech: POST /v1/text-to-speech/:voice_id (convert text to audio)
  • Voices: “Get voices” is referenced in the TTS docs as how you find voice IDs.
WebSocket URL (TTS streaming / multi-stream)
ElevenLabs provides WebSocket-based TTS input streaming and multi-context streaming docs, including a wss endpoint format for multi-stream.

5) ElevenLabs API key: what it is, how auth works, best practices

What is an ElevenLabs API key?

It’s the secret credential used to authenticate requests and track usage/quota.

The docs explain:

  • The API uses API keys for authentication
  • Every request must include the key
  • Keys can be scoped with restrictions (endpoint access) and credit quotas

How to send your key

Use the xi-api-key header:

xi-api-key: YOUR_ELEVENLABS_API_KEY

Security rules (non-negotiable)

The official docs warn that your API key is a secret and must not be exposed in client-side code (browser/mobile apps).

Do
  • Store keys in environment variables on your server
  • Use a secrets manager in production
  • Rotate keys if you suspect exposure
  • Use scoped keys per environment and per service
Don’t
  • Put keys in frontend JavaScript
  • Commit keys to GitHub
  • Share a single key across multiple vendors/contractors

“Can I use the API from the browser?”

Not directly with your real key. Instead, use:

  • Browser → your backend → ElevenLabs
  • Or use single-use tokens for specific endpoints, as documented.

6) Elevenlabs api free

People search “Elevenlabs api free” hoping for unlimited free TTS. The reality is:

  • Most real usage is paid
  • “Free” typically means limited usage via a free plan, trial, educational promo, or a grant program

What “free” can realistically mean

  • Free plan (limited included usage): The API pricing page includes a plan comparison starting with “Free”.
  • Startup grant (if approved): A Startup Grants Program may provide “12 months free” and “33M Characters” (grant-based and not guaranteed).
  • Student / promo programs: ElevenLabs has run education-focused promotions (example: an “AI Student Pack” post describing free access for a period).

What you should say on your page (safe + accurate)

  • You may be able to start on a free tier or limited program, but production usage is billed.
  • If your goal is “almost free,” optimize:
  • short text chunks
  • lower-cost / lower-latency models where acceptable
  • caching and reuse of audio outputs
  • fewer re-generations by improving prompts and normalization

7) ElevenLabs API pricing (2026): how billing typically works

ElevenLabs pricing is presented as plans and included usage, plus overages. Their API pricing page shows plan comparisons and includes approximate minutes and overage rates per minute for certain model categories (e.g., Multilingual V2/V3 vs Flash), along with audio quality notes.

A) What developers need to understand first

  • Your cost scales with usage
  • More text → more audio minutes → higher cost
  • Higher quality settings, faster streaming, and concurrency can affect what plan makes sense

Different models have different economics

  • Some models prioritize quality
  • Others prioritize latency and cost

ElevenLabs’ docs highlight different TTS models (e.g., Flash v2.5 for ultra-low latency; Turbo v2.5 balanced; Multilingual v2 stable for longer form; Eleven v3 for maximum expressiveness).

B) Plan comparison concepts (minutes + overages)

On the API pricing page, plan comparison tables show included minutes and approximate additional-minute pricing for model categories like “Multilingual V2/V3” and “Flash” (and more).

C) Concurrency and enterprise pricing

The pricing page mentions enterprise features such as elevated concurrency limits and “significant discounts at scale,” plus custom terms and support.

8) A practical pricing calculator (developer-friendly)

Because your real cost depends on text volume and conversion rate (characters → minutes), the easiest developer pricing calculator approach is:

Step 1: Decide how you measure usage
  • Minutes of generated audio per month (most intuitive)
  • Characters per month (how some plans/credits are framed)
Step 2: Convert your workload into minutes
  • Normal speaking rate ≈ 130–160 words/min (varies by voice and style)
  • If your app knows word count, approximate: minutes ≈ words / 150
Step 3: Multiply minutes by your overage rate (or plan)
The API pricing page includes “additional minutes” pricing for model categories (e.g., Multilingual V2/V3 and Flash), which you can use for “above included usage” estimates.

Mini calculator (words → minutes)

This does not guess plan costs; it helps convert workload into audio minutes using the common estimate words / 150.
Workload summary
Rule of thumb
minutes ≈ words / WPM

Example monthly cost scenarios (simple)

(Use these as “how to think about it”, not official quotes—your plan and rates depend on the pricing page and your account.)

Indie creator
300 minutes/month
If you exceed included minutes, estimate overages using the “additional minutes” rate shown for your chosen model category on the pricing table.
SaaS onboarding voice + notifications
2,000 minutes/month
If you’re serving many users, plan for concurrency and consider higher tiers (pricing page references elevated concurrency at higher tiers).
Voice-agent product
Real-time streaming is critical
Prefer a low-latency model like Flash/Turbo depending on quality needs (model positioning is described in product/docs pages).
Note
A good production approach: track minutes generated per feature, not just total minutes—this is how you find the true cost drivers.

9) ElevenLabs API text to speech: core flow (how it works)

The basic TTS pipeline

  1. Choose a voice
  2. Choose a model (and settings)
  3. Send text to TTS endpoint
  4. Receive audio bytes (file or stream)
  5. Store or deliver audio to your user

The main REST endpoint

The docs list “Create speech” as:

POST https://api.elevenlabs.io/v1/text-to-speech/:voice_id

It takes:

  • voice_id in the path
  • xi-api-key header
  • JSON body with at least text, and optionally model_id

Choosing the model for TTS

ElevenLabs presents different models for different needs (expressiveness, stability, latency).

Model Best for Why you’d pick it
Eleven v3 Maximum expressiveness Emotion, creative delivery, dramatic narration (and multi-speaker dialogue capabilities are highlighted in docs)
Multilingual v2 Stable long-form output Consistent voice quality for longer content
Flash v2.5 Ultra-low latency Fast voice response for real-time UX
Turbo v2.5 Balanced Good tradeoff between latency and quality

10) Streaming audio (HTTP) vs WebSockets (input streaming)

A) HTTP streaming (chunked transfer encoding)

If your entire text is available upfront, HTTP streaming is often the simplest “fast playback” solution. ElevenLabs documents streaming as returning raw audio bytes over HTTP using chunked transfer encoding, allowing clients to play or process audio incrementally.

When to use HTTP streaming
  • You have the full text already (or large blocks)
  • You want simpler infrastructure than WebSockets
  • You’re building “press play” style experiences

B) WebSocket TTS (input streaming)

If your text is being generated in chunks (e.g., from an LLM) and you want audio as the text arrives, WebSockets are designed for that. ElevenLabs explains the WebSockets TTS API is for generating audio from partial text input while keeping consistency through the generated audio.

When to use WebSockets
  • Real-time assistants where the model streams text
  • You want word-to-audio alignment data
  • You want to avoid waiting for complete text

C) Multi-context WebSocket streaming

If you’re building complex apps (like agents with multiple concurrent “speaking contexts”), ElevenLabs provides multi-context streaming over a single WebSocket connection, with a documented wss endpoint format.

11) ElevenLabs API Python: practical examples (REST + SDK patterns)

There are two common Python approaches:

  • Official Python SDK
  • Direct HTTP requests (requests/httpx)

A) Authentication in Python (key handling)

  • Read the key from environment variables
  • Never hardcode it in your codebase

B) REST request structure (what you send)

  • Path: /v1/text-to-speech/:voice_id
  • Header: xi-api-key
  • JSON includes text and optional model_id

C) Example: basic TTS request (Python, direct HTTP)

Copy/paste template (substitute your VOICE_ID and ELEVENLABS_API_KEY):

import os
import requests

API_KEY = os.environ["ELEVENLABS_API_KEY"]
VOICE_ID = "YOUR_VOICE_ID"

url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}"
headers = {
    "xi-api-key": API_KEY,
    "Content-Type": "application/json",
}
payload = {
    "text": "The first move is what sets everything in motion.",
    "model_id": "eleven_multilingual_v2",
}

resp = requests.post(url, headers=headers, json=payload)
resp.raise_for_status()

with open("speech.mp3", "wb") as f:
    f.write(resp.content)

print("Saved speech.mp3")

This mirrors the documented endpoint shape and headers.

D) Example: selecting Eleven v3

To use v3, set model_id to eleven_v3 for TTS:

payload = {
    "text": "Tonight, we tell the story like a whispered secret.",
    "model_id": "eleven_v3",
}

12) Building with ElevenLabs: architectures that scale

Option 1: Simple backend “audio generation” service
Best for: websites, content generation tools, small apps.
Flow
  • Client sends request → your backend
  • Backend calls ElevenLabs TTS
  • Backend stores audio (S3 / Cloud Storage)
  • Backend returns a signed URL to client
This is also a common pattern in streaming guides (generate, upload, share).
Option 2: Real-time voice agent
Best for: conversational systems.
Flow
  • User speaks → STT → LLM → TTS streaming
  • Use HTTP chunked streaming or WebSockets depending on your text availability and latency needs
Option 3: High-volume batch generation
Best for: dubbing pipelines, large content libraries.
Tips
  • Queue jobs
  • Retry on transient failures
  • Cache and de-duplicate
  • Keep an internal “audio asset registry” so you never regenerate the same output twice

13) Key parameters that matter (quality, output format, retention)

Output format

The TTS docs show example usage with an output_format query parameter (e.g., mp3 format).

Logging / retention controls

The TTS convert docs include a query parameter: enable_logging (defaults true), and describe “zero retention mode” for eligible customers when logging is disabled.

Latency optimization (deprecated parameter note)

The TTS docs reference optimize_streaming_latency as deprecated and outline latency/quality tradeoffs.

14) Cost control strategies (how to reduce spend)

Even if you’re on a plan with included minutes, cost control matters because:

  • Users will regenerate
  • Agents can loop
  • Long-form content scales quickly

Practical strategies

  • Pick the right model for the job
    • Ultra-low latency needs → Flash/Turbo type models
    • Long-form stability → Multilingual v2
    • Maximum expressiveness → Eleven v3
  • Normalize and chunk text
    • Convert long content in chunks so you can reuse segments
    • Avoid re-synthesizing entire passages when only one line changes
  • Cache outputs
    • Store audio results keyed by (voice_id, model_id, text_hash, settings_hash)
    • Reuse for identical text (especially common UI phrases)
  • Limit “regenerate” loops
    • Add UI controls (“try again” limits)
    • Show preview first, then generate final-quality audio
  • Measure usage at the right level
    • Track minutes generated per feature
    • Track cost per user cohort
    • Cap background generation per day for free users

15) Troubleshooting: common developer issues

“My request returns 401/403”

Common causes:

  • Missing xi-api-key header
  • Wrong key or key scope restrictions

Docs emphasize including xi-api-key and that keys can be scoped.

“Audio is slow to start”

Try:

  • A lower-latency model category (Flash/Turbo)
  • Streaming (HTTP chunked transfer) so you can play as bytes arrive

“WebSocket streaming is complicated”

Use WebSockets when you truly need input streaming; otherwise HTTP streaming is simpler for full-text requests. The docs explain WebSockets are best when text comes in chunks.

“I need voice_id”

The TTS docs indicate you can use the “Get voices” endpoint to list voices and retrieve IDs.

16) Security checklist (for your ElevenLabs API key)

Goal: never leak your key, never let untrusted clients burn your quota, and keep access segmented.

  • Use one key per environment (dev/staging/prod)
  • Use scoped keys (limit endpoints where possible)
  • Set credit quota limits per key (where supported)
  • Store keys in secrets manager
  • Rotate keys on a schedule
  • Never ship your secret key in frontend code (explicitly warned in docs)

If you need client-side access for certain flows, look into single-use tokens (documented as a way to connect without exposing your key in some scenarios).

17) A “developer quickstart” checklist (what to do first)

  1. Create an account and get your API key
  2. Store it as an environment variable: ELEVENLABS_API_KEY=...
  3. Call a simple endpoint (like listing models, or a minimal TTS request)
  4. Choose a voice and model
  5. Add streaming if you need low-latency playback
  6. Add caching + retry logic before you scale

ElevenLabs API vs other Voice AI APIs (2026)

Two categories you must separate
  • Text-to-Speech (TTS): text → lifelike audio voice
  • Speech-to-Text (STT / ASR): audio → transcript
ElevenLabs is unusual because it’s strong in both (TTS + STT) on one platform (their API plans explicitly include TTS + STT).

1) Quick verdicts

Best “all-in-one” Voice API (TTS + STT + real-time)
ElevenLabs — Great if you want expressive TTS (Eleven v3) plus STT (Scribe + realtime), and you prefer a single vendor + single key + unified plans.
Best Speech-to-Text API (depends on your use case)
  • Lowest friction + strong baseline: OpenAI STT (Whisper + newer transcribe snapshots like gpt-4o-transcribe)
  • Production-grade real-time STT: Deepgram (focus on low latency + streaming)
  • Enterprise cloud integration: Google STT (Chirp) / AWS Transcribe / Azure Speech

2) TTS comparison: ElevenLabs vs alternatives

Provider Why teams pick it Typical fit
ElevenLabs (TTS) Very expressive output; model choices positioned by latency/quality (Flash/Turbo/Multilingual/Eleven v3). Streaming supported (raw audio bytes via HTTP chunked transfer). Easy SDK-based streaming examples in docs. Creator-grade voice + production apps, agents, and products that want a premium voice feel
OpenAI (TTS) If your app already uses OpenAI models and you want TTS as part of the same stack (Audio API + streaming). Tradeoff: voice “expressiveness + brand voice tooling” is often the deciding factor. LLM-first apps that want one vendor for LLM + voice
Google Cloud TTS Strong enterprise reliability; clear tiers including premium voice offerings. Call center/enterprise stacks already on GCP
Amazon Polly Simple, transparent pricing; dependable managed TTS. AWS-native workloads and backend-heavy apps
Azure TTS Strong enterprise ecosystem integration; common for regulated orgs on Microsoft stack. Teams building in Azure / Microsoft ecosystem

3) STT comparison: ElevenLabs vs best Speech-to-Text APIs

Provider What’s compelling When it’s best
ElevenLabs (STT / Scribe) STT + TTS in one platform; positioned for agents and realtime interactions. Billing commonly based on audio duration (varies by plan/model). You already use ElevenLabs TTS and want one unified vendor for both directions
OpenAI (STT) Audio API supports transcription; pairs well with LLM workflows (summaries, extraction, agent actions). Fast setup + strong baseline accuracy + tight “STT → reasoning → actions” loop
Deepgram (STT) Strong real-time focus; streaming-first product positioning. Real-time voice UX (calls, agents, live captions) where latency matters
Google STT (Chirp) Enterprise-grade cloud STT; multilingual features like diarization depending on model. GCP-first teams needing governance, predictable ops, and scale
AWS Transcribe Managed STT integrated with AWS services and governance. AWS-native stacks, compliance-heavy environments
Azure Speech-to-Text Real-time + batch; strong enterprise tools and deployment options in Microsoft ecosystem. Azure-native workloads and regulated enterprises
AssemblyAI (STT) Developer-focused streaming STT messaging and tooling. Teams building real-time experiences and wanting a dedicated STT platform

4) “Best Speech to Text API” — pick by scenario

Voice agent (live, low-latency)
Pick one of these:
  • Deepgram (real-time focus)
  • ElevenLabs STT + TTS (one vendor, paired with expressive TTS)
  • OpenAI STT (tight “STT → reasoning → actions” loop)
Lots of recorded audio (podcasts, meetings, video)
  • OpenAI STT (simple baseline + easy post-processing)
  • Google / AWS / Azure if you’re committed to that cloud’s data/compliance stack
Multi-language at enterprise scale
  • Google STT (Chirp)
  • Azure Speech
  • ElevenLabs STT (paired platform, depending on your needs)
Decision checklist
  • Do you need TTS too (or only STT)?
  • Is your product real-time?
  • Are you locked into a cloud vendor?

18) FAQs

Is ElevenLabs API v3 available?
Yes—Eleven v3 is available via the API, and the changelog states you can use it by specifying model_id = eleven_v3 for TTS requests.
What header does ElevenLabs use for API keys?
The docs specify xi-api-key as the authentication header for API requests.
What is the ElevenLabs API URL?
The core base URL is https://api.elevenlabs.io, and TTS convert is documented under /v1/text-to-speech/:voice_id.
Can I stream TTS audio in real time?
Yes. ElevenLabs documents HTTP streaming using chunked transfer encoding and WebSocket-based TTS input streaming.
Is Elevenlabs API free?
There are free/limited options (free plan, promos, or grants), but production usage is typically paid; the pricing page includes plan comparisons and also describes a Startup Grants Program.

Final takeaway

If you’re building in 2026, the “winning” ElevenLabs API approach is:

  • Use REST TTS for straightforward text-to-audio generation
  • Add HTTP streaming for faster playback when you have full text
  • Use WebSockets when your text arrives in chunks (agents/LLMs)
  • Use Eleven v3 (eleven_v3) when you need maximum expressiveness
  • Treat your API key like a password, and never expose it client-side
  • Use the API pricing page to choose the right plan + overage model for your usage pattern