For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocsAPI ReferenceSDKs
DocsAPI ReferenceSDKs
  • Introduction
    • Product Overview
    • Pricing
    • Model Explanation
    • Available Models
  • Guides
    • Quickstart
    • Authentication
    • BYOK
    • Dashboard Guide
    • Prompt Templates
    • Embeddings
    • RAG (Files & Search)
    • Audio
    • Images & Vision
    • Image Generation
    • Compare
    • Batch API
    • Auto Routing
    • Realtime Audio
  • SDKs
    • Node.js (TypeScript)
    • Python
    • Go
  • Infrastructure
    • Architecture
LogoLogo
On this page
  • Supported endpoints
  • Basic request
  • Response metadata
  • Non-streaming requests
  • Streaming requests
  • Fallback behavior
  • Billing
Guides

Auto Routing

||View as Markdown|
Was this page helpful?
Edit this page
Previous

Batch API

Next

Realtime Audio

Built with

Mesh API’s Auto Router allows you to simply set model: "auto" on any inference request. The gateway will classify the request using an internal LLM and select the most appropriate model from the live registry, forwarding the request transparently.

This requires no client-side logic beyond setting the model field to "auto".

Supported endpoints

Auto Routing is supported across the following inference endpoints:

EndpointStreaming Supported
POST /v1/chat/completionsYes
POST /v1/responsesYes
POST /v1/embeddingsNo

Basic request

Just replace your specific model ID with "auto":

curl
Node.js SDK
Python SDK
Go SDK
Java SDK
$curl https://api.meshapi.ai/v1/chat/completions \
> -H "Authorization: Bearer <YOUR_RSK_KEY>" \
> -H "Content-Type: application/json" \
> -d '{
> "model": "auto",
> "messages": [{"role": "user", "content": "Write a Python quicksort implementation"}]
> }'

Response metadata

When a request is automatically routed, Mesh API injects metadata into the response so you know which model was actually used.

Non-streaming requests

The metadata is included directly in the response body.

1{
2 "id": "chatcmpl-...",
3 "model": "openai/gpt-4o",
4 "choices": [...],
5 "x_auto_routed": true,
6 "x_resolved_model_id": "openai/gpt-4o"
7}

If the internal classifier failed or timed out and a fallback model was used, additional fields are present:

1{
2 "x_auto_routed": true,
3 "x_resolved_model_id": "openai/gpt-4o-mini",
4 "x_auto_routed_fallback": true,
5 "x_auto_routed_fallback_reason": "classifier_timeout"
6}

Streaming requests

For streaming requests (stream: true), the metadata is included as HTTP response headers before the SSE stream begins:

X-Auto-Routed: true
X-Resolved-Model-Id: openai/gpt-4o

Fallback behavior

The Auto Router is designed to never block a request due to its own failure. If the internal classification model fails to respond in time or returns an unknown model, the gateway will automatically fall back to a reliable default model (e.g., openai/gpt-4o-mini).

Billing

When using the Auto Router, you are billed for the tokens consumed by the resolved model that actually served the request, as well as the tokens consumed by the internal classifier model. Both will appear in your usage dashboard.