Audio (TTS & STT)

View as Markdown

Audio

Text-to-Speech

client.Audio.Synthesize sends POST /v1/audio/speech and returns []byte of raw audio.

1import (
2 "os"
3 meshapi "github.com/aifiesta/meshapi-go-sdk"
4)
5
6model := "sarvam/bulbul:v2"
7voice := "meera"
8
9audioBytes, err := client.Audio.Synthesize(ctx, meshapi.SpeechParams{
10 Input: "Hello from MeshAPI.",
11 Model: &model,
12 Voice: &voice,
13})
14if err != nil {
15 log.Fatal(err)
16}
17
18os.WriteFile("output.wav", audioBytes, 0644)

SpeechParams fields

FieldTypeNotes
InputstringRequired. Text to synthesize.
Model*stringRequired. e.g. "sarvam/bulbul:v2"
Voice*stringVoice ID or name
ResponseFormat*stringAudio format, e.g. "wav", "mp3"
Speed*float64Playback speed multiplier

Speech-to-Text (Transcription)

client.Audio.Transcribe sends POST /v1/audio/transcriptions as a multipart upload and returns *TranscriptionResponse.

1import "os"
2
3fileData, err := os.ReadFile("audio.wav")
4if err != nil {
5 log.Fatal(err)
6}
7
8lang := "en"
9result, err := client.Audio.Transcribe(ctx, fileData, "audio.wav", meshapi.TranscriptionParams{
10 Model: "sarvam/saaras:v3",
11 Language: &lang,
12})
13if err != nil {
14 log.Fatal(err)
15}
16
17fmt.Println(result.Text)

With keyterms (sent as repeated form fields):

1result, err := client.Audio.Transcribe(ctx, fileData, "audio.wav", meshapi.TranscriptionParams{
2 Model: "sarvam/saaras:v3",
3 Keyterms: []string{"MeshAPI", "transcription"},
4})

TranscriptionParams key fields

FieldTypeNotes
ModelstringRequired. e.g. "sarvam/saaras:v3"
Language*stringLanguage code, e.g. "en"
Keyterms[]stringDomain-specific terms to boost recognition
Diarize*boolEnable speaker diarization
NumSpeakers*intExpected number of speakers
WithTimestamps*boolInclude word-level timestamps

Translation

client.Audio.Translate sends POST /v1/audio/transcriptions/translate and returns the audio transcribed and translated to English.

1result, err := client.Audio.Translate(ctx, fileData, "audio.wav", &meshapi.TranscriptionTranslateParams{
2 Model: strPtr("sarvam/saaras:v3"),
3})
4if err != nil {
5 log.Fatal(err)
6}
7
8fmt.Println(result.Text)

List Voices

client.Audio.ListVoices sends GET /v1/audio/voices.

1pageSize := 10
2voices, err := client.Audio.ListVoices(ctx, &meshapi.ListVoicesParams{
3 PageSize: &pageSize,
4})
5if err != nil {
6 log.Fatal(err)
7}
8
9fmt.Printf("%v\n", voices)

ListVoicesParams fields

FieldTypeNotes
PageSize*intResults per page
NextPageToken*stringPagination cursor
Search*stringFilter by name
VoiceType*string"standard", "cloned", etc.
Category*stringVoice category filter

Get Voice

client.Audio.GetVoice sends GET /v1/audio/voices/{voice_id}.

1voice, err := client.Audio.GetVoice(ctx, "voice-id")
2if err != nil {
3 log.Fatal(err)
4}
5fmt.Printf("%v\n", voice)