Text-to-Speech
Text-to-Speech
Audio Generation
Mesh API provides a full suite of audio endpoints — convert text to speech, transcribe audio files, stream TTS/STT in real time, and browse available voices — all through a single API key.
All endpoints share the same base URL: https://api.meshapi.ai/v1/audio
Auth: Authorization: Bearer rsk_<your-key> on all REST requests. WebSocket endpoints accept the key via Sec-WebSocket-Protocol: Bearer <rsk_...> or ?api_key=<rsk_...>.
Text-to-Speech
POST /v1/audio/speech
Convert a text string into audio. The provider (ElevenLabs, Sarvam, etc.) is selected automatically based on the model you pass. Streaming is enabled by default.
Request body
Supported output formats
Streaming (stream: true): mp3_22050_32, mp3_24000_48, mp3_44100_32/64/96/128/192, pcm_8000/16000/22050/24000/32000/44100/48000, ulaw_8000, alaw_8000, opus_48000_32/64/96/128/192
Non-streaming (stream: false): All of the above, plus wav_8000/16000/22050/24000/32000/44100/48000
Response
The response body is raw audio bytes with the Content-Type matching the requested format (e.g. audio/mpeg, audio/wav).
Examples
curl (streaming)
curl (non-streaming WAV)
Python
Node.js
WebSocket TTS Streaming
WS /v1/audio/speech/stream/{voice_id}
Stream text-to-speech in real time. You send text chunks as they become available (e.g. as an LLM streams tokens), and receive audio back chunk by chunk — minimising latency compared to the REST endpoint.
This proxies ElevenLabs’ stream-input WebSocket. The voice_id is part of the URL path.
Authentication
Pass your Mesh API key in one of two ways:
Sec-WebSocket-Protocol: Bearer rsk_...header?api_key=rsk_...query parameter
Query parameters
Supported output formats: mp3_22050_32, mp3_44100_32/64/96/128/192, pcm_16000/22050/24000/44100, ulaw_8000
Message protocol
Client → server (JSON frames)
Server → client (JSON frames)
Any xi-api-key or authorization fields you include in client frames are stripped before being forwarded upstream — your ElevenLabs credentials are never exposed to the client.