Audio (TTS & STT)

View as Markdown

Audio

Text-to-Speech

client.audio.synthesize sends POST /v1/audio/speech and returns raw audio bytes.

1from meshapi import MeshAPI, SpeechParams
2
3client = MeshAPI(base_url="https://api.meshapi.ai", token="rsk_...")
4
5audio_bytes = client.audio.synthesize(
6 SpeechParams(
7 input="Hello from MeshAPI.",
8 model="sarvam/bulbul:v2",
9 voice="meera",
10 )
11)
12
13with open("output.wav", "wb") as f:
14 f.write(audio_bytes)

Async

1from meshapi import AsyncMeshAPI, SpeechParams
2
3async with AsyncMeshAPI(base_url="https://api.meshapi.ai", token="rsk_...") as client:
4 audio_bytes = await client.audio.synthesize(
5 SpeechParams(
6 input="Hello from MeshAPI.",
7 model="sarvam/bulbul:v2",
8 )
9 )

SpeechParams fields

FieldTypeNotes
inputstrRequired. Text to synthesize.
modelstrRequired. e.g. "sarvam/bulbul:v2"
voicestr | NoneVoice ID or name
response_formatstr | NoneAudio format, e.g. "wav", "mp3"
speedfloat | NonePlayback speed multiplier

Speech-to-Text (Transcription)

client.audio.transcribe sends POST /v1/audio/transcriptions as a multipart upload and returns a TranscriptionResponse.

1from meshapi import TranscriptionParams
2
3with open("audio.wav", "rb") as f:
4 file_bytes = f.read()
5
6result = client.audio.transcribe(
7 TranscriptionParams(
8 model="sarvam/saaras:v3",
9 file=file_bytes,
10 file_name="audio.wav",
11 language="en",
12 )
13)
14
15print(result.text)

TranscriptionParams key fields

FieldTypeNotes
modelstrRequired. e.g. "sarvam/saaras:v3"
filebytesRequired. Audio file bytes.
file_namestrRequired. Filename with extension.
languagestr | NoneLanguage code, e.g. "en"
keytermslist[str] | NoneDomain-specific terms to boost recognition
diarizebool | NoneEnable speaker diarization
num_speakersint | NoneExpected number of speakers
with_timestampsbool | NoneInclude word-level timestamps

Translation

client.audio.translate sends POST /v1/audio/transcriptions/translate and returns the audio transcribed and translated to English.

1from meshapi import TranscriptionTranslateParams
2
3with open("audio.wav", "rb") as f:
4 file_bytes = f.read()
5
6result = client.audio.translate(
7 TranscriptionTranslateParams(
8 model="sarvam/saaras:v3",
9 file=file_bytes,
10 file_name="audio.wav",
11 )
12)
13
14print(result.text)

List Voices

client.audio.list_voices sends GET /v1/audio/voices.

1from meshapi import ListVoicesParams
2
3voices = client.audio.list_voices(ListVoicesParams(page_size=10))
4print(voices)

ListVoicesParams fields

FieldTypeNotes
page_sizeint | NoneResults per page
next_page_tokenstr | NonePagination cursor
searchstr | NoneFilter by name
voice_typestr | None"standard", "cloned", etc.
categorystr | NoneVoice category filter

Get Voice

client.audio.get_voice sends GET /v1/audio/voices/{voice_id}.

1voice = client.audio.get_voice("voice-id")
2print(voice)