For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocsAPI ReferenceSDKs
DocsAPI ReferenceSDKs
  • Introduction
    • Product Overview
    • Pricing
    • Model Explanation
    • Available Models
  • Guides
    • Quickstart
    • Authentication
    • BYOK
    • Dashboard Guide
    • Prompt Templates
    • Embeddings
    • RAG (Files & Search)
    • Audio
    • Images & Vision
    • Image Generation
    • Compare
    • Batch API
    • Auto Routing
    • Realtime Audio
  • SDKs
    • Node.js (TypeScript)
    • Python
    • Go
  • Infrastructure
    • Architecture
LogoLogo
On this page
  • Overview
  • Quickstart
  • Authentication
  • Message protocol
  • Supported models
  • Pricing
  • Pre-flight pricing
  • Billing
  • Limits and known caveats
  • Errors
  • error.code reference
  • WebSocket close codes (incidental)
  • Next steps
Guides

Realtime Audio

||View as Markdown|
Was this page helpful?
Edit this page
Previous

Auto Routing

Next

Node.js SDK

Built with

Overview

wss://api.meshapi.ai/v1/realtime is a WebSocket gateway that proxies OpenAI’s Realtime API for low-latency, bidirectional speech-to-speech sessions.

  • Same auth surface as the rest of Mesh — your rsk_... data-plane token.
  • Wire format is identical to OpenAI’s Realtime API. Mesh passes JSON event bodies through verbatim in both directions, so any client written against the upstream spec works against Mesh by switching the WebSocket URL.
  • Billed on usage, metered at session close.

It’s intended for voice agents, live transcription with response, and any half-duplex / full-duplex audio UX where round-trip latency matters.

Quickstart

Open a WebSocket to wss://api.meshapi.ai/v1/realtime with a model query parameter, send a session.update, then stream audio chunks via input_audio_buffer.append events.

Browser (raw WebSocket)
curl-style probe
1// Browsers only allow strings (not headers) in the subprotocol slot.
2const ws = new WebSocket(
3 "wss://api.meshapi.ai/v1/realtime?model=gpt-realtime",
4 ["openai-realtime", `Bearer.${MESH_API_KEY}`],
5);
6
7ws.addEventListener("open", () => {
8 ws.send(JSON.stringify({
9 type: "session.update",
10 session: {
11 modalities: ["audio", "text"],
12 voice: "alloy",
13 },
14 }));
15});
16
17ws.addEventListener("message", (e) => {
18 const event = JSON.parse(e.data);
19 if (event.type === "response.audio.delta") {
20 playAudio(event.delta);
21 }
22});

Authentication

Two methods are supported. Both require TLS — ws:// is rejected.

1. Subprotocol header (preferred). Send your key in the Sec-WebSocket-Protocol header alongside the literal protocol marker:

Sec-WebSocket-Protocol: openai-realtime, Bearer <YOUR_RSK_KEY>

This matches OpenAI’s upstream auth and keeps the key out of URL logs.

2. Query string fallback. For browsers and tools that can’t customize the subprotocol cleanly, append ?api_key=<YOUR_RSK_KEY> to the URL. The key is redacted from server access logs but will appear in client-side history, so prefer the subprotocol method anywhere you control the runtime.

wss://api.meshapi.ai/v1/realtime?model=gpt-realtime&api_key=rsk_...

If both are present, the subprotocol wins.

Message protocol

Mesh is a transparent proxy for OpenAI’s Realtime API. Every event you send and receive is shaped exactly as upstream documents it — Mesh does not rewrite event types, field names, or payloads.

See the OpenAI Realtime API reference for the full event catalog. The most common types you’ll exchange:

DirectionEventPurpose
Client →session.updateConfigure modalities, voice, instructions, tools
Client →input_audio_buffer.appendAppend base64 PCM16 audio chunk
Client →input_audio_buffer.commitMark end of an utterance
Client →response.createAsk the model to respond
Server →response.audio.deltaStreamed audio chunk back
Server →response.text.deltaStreamed text token
Server →response.doneResponse complete; usage included
Server →errorOpenAI-shaped error envelope

Supported models

The model query parameter is required.

Model ID
openai/gpt-realtime-2
openai/gpt-realtime-1.5
openai/gpt-realtime-mini
openai/gpt-realtime

Pricing

MeshAPI charges OpenAI’s exact rates for realtime models — zero markup, no surprises. Rates are subject to OpenAI’s pricing changes; consult /v1/models for the canonical current values.

Per-session usage and cost lands in your existing /v1/usage history and the dashboard. There is no in-stream cost message on the WebSocket — query /v1/usage after the session completes.

Pre-flight pricing

The /v1/models response indicates which models support realtime and includes per-million-token rates for text, audio, and cached audio inputs and outputs. Use these to display estimates in your UI before opening a session — no need to hard-code rates in the client.

Billing

  • Account balance required. You need at least 10 USD account balance to open a realtime session. If your account balance is exhausted during a session, the connection is closed with an insufficient_quota error. Top up to reconnect. Partial responses up to the point of disconnect are still billed.
  • Session token caps. Sessions configured with a max-token cap close with a session_token_cap_exceeded error once the cap is reached.

Usage events are written to your account’s usage log at session close, accessible via GET /v1/usage and the dashboard. Sessions that get cut short (network drop, browser tab close) still bill for the tokens that were processed. Query /v1/usage for canonical numbers.

Limits and known caveats

  • Session length. Sessions are capped at 30 minutes by upstream; Mesh doesn’t extend this. Long-running agents should reconnect and resume application-level state.
  • Ingress timeout. Idle sockets (no client→server frames for 60s) are closed by the L7 ingress. Send a session.update ping or keep the audio buffer flowing.
  • Safari subprotocol quirk. Safari historically mangles the second subprotocol token when it contains a space (Bearer <key>). If you’re targeting Safari, use the ?api_key= query fallback or send the key as Bearer.<key> (dot separator).
  • No HTTP fallback. This endpoint only exists as a WebSocket upgrade. GET /v1/realtime over plain HTTP returns 426 Upgrade Required.

Errors

Realtime errors are delivered as a JSON frame ({ "type": "error", "error": { "code", "message" }, "request_id": "..." }) immediately before the WebSocket is closed. Use error.code for programmatic handling — it is the stable, semantic identifier. The WebSocket close code that follows is incidental.

error.code reference

error.codeMeaning
invalid_api_keyMissing, malformed, or revoked rsk_... key.
insufficient_quotaAccount balance is exhausted. Top up to reconnect.
session_token_cap_exceededSession exceeded its configured token cap.
model_not_foundThe requested model is not available to your account or does not support realtime.
provider_errorThe upstream provider returned an error. Inspect error.message for detail.

In-band errors during a live session arrive as a regular error event with the OpenAI-shaped envelope shown above and are not accompanied by a socket close.

WebSocket close codes (incidental)

These close codes accompany the JSON error frame on session-terminating errors. Check error.code for the semantic reason; the close code below is informational only.

Close codeWhen you’ll see it
1008Policy violation — most auth, quota, and access errors.
1011Server-side or upstream condition the gateway couldn’t recover from.
4402Quota or session-cap related termination (paired with insufficient_quota or session_token_cap_exceeded).
4000–4999OpenAI-originated close codes are forwarded verbatim.

Next steps

  • Review the Authentication guide for key rotation and scoping.
  • See the API reference entry for the OpenAPI stub.
  • Watch the upstream OpenAI Realtime API reference for new event types — Mesh forwards them without code changes.