Speech-to-Text
Speech-to-Text
Audio Generation
Mesh API provides a full suite of audio endpoints — convert text to speech, transcribe audio files, stream TTS/STT in real time, and browse available voices — all through a single API key.
All endpoints share the same base URL: https://api.meshapi.ai/v1/audio
Auth: Authorization: Bearer rsk_<your-key> on all REST requests. WebSocket endpoints accept the key via Sec-WebSocket-Protocol: Bearer <rsk_...> or ?api_key=<rsk_...>.
Speech-to-Text
POST /v1/audio/transcriptions
Transcribe an audio file. The provider is resolved automatically from the model name. You can supply audio as a file upload, a public URL, or a cloud storage URL.
This endpoint uses multipart/form-data — not JSON.
Form fields
Response
Examples
curl (file upload)
curl (URL)
Python
Node.js
Transcribe and Translate
POST /v1/audio/transcriptions/translate
Transcribe audio and translate the result directly to English in a single step. Uses Sarvam models by default.
This endpoint uses multipart/form-data.
Form fields
Response
Example
If the selected model doesn’t support translation, the API returns a 422 error. Check GET /v1/models to confirm a model’s capabilities.
WebSocket Real-Time STT
WS /v1/audio/transcriptions/realtime
Transcribe audio in real time. Send raw audio chunks as they are captured (e.g. from a microphone) and receive partial and final transcripts as they are produced.
This proxies ElevenLabs’ Scribe v2 Realtime endpoint.
Authentication
Pass your Mesh API key in one of these ways:
Sec-WebSocket-Protocol: Bearer rsk_...header?api_key=rsk_...query parameter?token=rsk_...query parameter
Query parameters
Supported audio formats: pcm_8000, pcm_16000, pcm_22050, pcm_24000, pcm_44100, pcm_48000, ulaw_8000
Message protocol
Client → server (JSON frames)
Send input_audio_chunk frames with base64-encoded audio. This is the only message type the server forwards upstream — any other message type is silently dropped.
Set "commit": true to trigger a VAD commit when using commit_strategy: manual.
Server → client (JSON frames)
Example
Voice Management
List voices
GET /v1/audio/voices
Browse voices available on your ElevenLabs account. Supports pagination, full-text search, and filtering.
Get a single voice
GET /v1/audio/voices/{voice_id}
Fetch metadata for a specific voice by its ElevenLabs voice ID.
Error handling
All endpoints use standard HTTP status codes. Common cases:
WebSocket sessions send a JSON error frame and then close with code 1000 before disconnecting.