Realtime Audio

View as Markdown

Realtime Audio

client.Realtime opens a bidirectional WebSocket session to wss://api.meshapi.ai/v1/realtime. The wire format is identical to OpenAI’s Realtime API — every event you send and receive is shaped exactly as upstream documents it.

Connect and close

1session, err := client.Realtime.Connect(ctx, meshapi.RealtimeConnectParams{
2 Model: "openai/gpt-realtime-mini",
3})
4if err != nil {
5 log.Fatal(err)
6}
7defer session.Close()

Configure the session

1err = session.Send(ctx, map[string]any{
2 "type": "session.update",
3 "session": map[string]any{
4 "type": "realtime",
5 "modalities": []string{"audio", "text"},
6 "voice": "alloy",
7 "instructions": "You are a helpful assistant.",
8 },
9})

Send audio

1// pcmBytes is raw 16-bit PCM at 24 kHz mono
2err = session.SendAudio(ctx, pcmBytes)

RealtimeMessage

Every frame from the server is a meshapi.RealtimeMessage. Exactly one of the fields below is non-zero per message:

FieldTypeWhen set
Audio[]byteBinary frame — raw PCM audio
TextstringRaw JSON string for text frames
Eventmap[string]anyParsed JSON map for text frames (check Event["type"])

Audio and text frames are mutually exclusive. Text frames always populate both Text and Event (unless the frame is not valid JSON, in which case only Text is set).

Receive frames

Receive blocks until the next frame arrives. Context cancellation unblocks it immediately:

1msg, err := session.Receive(ctx)
2if err == io.EOF {
3 fmt.Println("session closed")
4} else if err != nil {
5 log.Fatal(err)
6}
7
8if msg.Audio != nil {
9 // binary audio frame — play or buffer
10 playAudio(msg.Audio)
11} else if msg.Event != nil {
12 fmt.Println("event type:", msg.Event["type"])
13}

Events channel (concurrent pump)

For concurrent send/receive, use Events to pump frames into a channel:

1msgCh, errCh := session.Events(ctx)
2
3for msg := range msgCh {
4 switch {
5 case msg.Audio != nil:
6 playAudio(msg.Audio)
7 case msg.Event != nil:
8 switch msg.Event["type"] {
9 case "response.text.delta":
10 fmt.Print(msg.Event["delta"])
11 case "response.done":
12 fmt.Println("\n[done]")
13 case "error":
14 log.Printf("server error: %v", msg.Event)
15 }
16 }
17}
18
19if err := <-errCh; err != nil {
20 log.Fatal(err)
21}

Full voice agent example

1session, _ := client.Realtime.Connect(ctx, meshapi.RealtimeConnectParams{
2 Model: "openai/gpt-realtime-mini",
3})
4defer session.Close()
5
6// Configure
7session.Send(ctx, map[string]any{
8 "type": "session.update",
9 "session": map[string]any{
10 "type": "realtime",
11 "modalities": []string{"audio", "text"},
12 "voice": "alloy",
13 },
14})
15
16// Stream microphone audio
17go func() {
18 for chunk := range micCh {
19 session.SendAudio(ctx, chunk)
20 }
21 session.Send(ctx, map[string]any{"type": "input_audio_buffer.commit"})
22 session.Send(ctx, map[string]any{"type": "response.create"})
23}()
24
25// Play response audio
26msgCh, _ := session.Events(ctx)
27for msg := range msgCh {
28 if msg.Audio != nil {
29 speaker.Write(msg.Audio)
30 }
31}

Error handling

Server errors arrive as a *meshapi.RealtimeError from Receive or on the errCh returned by Events:

1msg, err := session.Receive(ctx)
2if err != nil {
3 var re *meshapi.RealtimeError
4 if errors.As(err, &re) {
5 fmt.Println("code:", re.Code) // "invalid_api_key", "insufficient_quota", …
6 fmt.Println("message:", re.Message)
7 }
8}

Supported models

Model ID
openai/gpt-realtime-mini
openai/gpt-realtime
openai/gpt-realtime-2