For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocsAPI ReferenceSDKs
DocsAPI ReferenceSDKs
  • Introduction
    • Product Overview
    • Pricing
    • Model Explanation
    • Available Models
  • Guides
    • Quickstart
    • Authentication
    • BYOK
    • Dashboard Guide
    • Prompt Templates
    • Embeddings
    • RAG (Files & Search)
    • Audio
    • Images & Vision
    • Image Generation
    • Compare
    • Batch API
    • Auto Routing
    • Realtime Audio
  • SDKs
    • Node.js (TypeScript)
    • Python
    • Go
  • Infrastructure
    • Architecture
LogoLogo
On this page
  • How it works
  • Basic request
  • Request fields
  • Response shape
  • Streaming (SSE)
  • Mode 1: With comparison (skip_comparison: false, default)
  • Mode 2: Skip comparison (skip_comparison: true)
  • SDK coverage
  • When to use compare
Guides

Compare

||View as Markdown|
Was this page helpful?
Edit this page
Previous

Image Generation

Next

Batch API

Built with

Use POST /v1/chat/compare when you want to run the same conversation across multiple models and inspect the results side by side.

How it works

  1. Fan-out: All requested models are called concurrently. The total wall-clock time is roughly that of the slowest model, not the sum of all models.
  2. Error isolation: If a single model fails or times out (hard timeout of 120s), the others continue unaffected. Partial results are returned with a partial: true flag.
  3. Synthesis (default): After all models respond, a separate comparison LLM analyzes the responses and produces a structured evaluation covering accuracy, completeness, clarity, and a recommendation.
  4. Skip synthesis (optional): By setting skip_comparison: true, you can skip the synthesis step and receive only the raw model outputs. This is useful for parallel streaming UIs that perform their own comparison.
  5. Rate limiting and Billing: The entire comparison counts as a single request against your rate limits (RPM/RPD). However, billing tracks each model call plus the comparison call as separate usage events (N+1 events).
  6. Streaming: Two streaming modes are available by setting stream: true. With synthesis enabled, fan-out is non-streaming, but the final comparison text is streamed token-by-token. If skip_comparison: true is set, each fan-out model streams its tokens in real-time concurrently, tagged by model name.

Basic request

curl
Node.js SDK
Python SDK
Go SDK
Java SDK
$curl https://api.meshapi.ai/v1/chat/compare \
> -H "Authorization: Bearer <YOUR_RSK_KEY>" \
> -H "Content-Type: application/json" \
> -d '{
> "models": [
> "openai/gpt-4o-mini",
> "anthropic/claude-3.5-haiku"
> ],
> "messages": [
> { "role": "user", "content": "Explain vector search in two sentences." }
> ]
> }'

Request fields

FieldTypeNotes
modelsstring[]Models to compare.
messageschat message[]Conversation sent to each model.
comparison_modelstringOptional model used for synthesized comparison output.
comparison_instructionsstringOptional comparison rubric or guidance.
skip_comparisonbooleanReturn per-model outputs without synthesized comparison text.
streambooleanOptional streaming mode.

Response shape

The response includes:

  • the compared model list
  • one result per model
  • optional synthesized comparison text
  • latency and request metadata

Streaming (SSE)

Set "stream": true to receive a text/event-stream with typed events. There are two streaming modes:

Mode 1: With comparison (skip_comparison: false, default)

Fan-out models are non-streaming (full response collected per model), then the comparison LLM streams token-by-token.

EventWhenPayload
metaImmediately after auth{"comparison_id", "models", "comparison_model", "skip_comparison": false}
model_chunkAs each fan-out model finishes{"model", "delta", "latency_ms", "error", "error_code", "usage"}
model_doneAll fan-out results collected{"results": [...]}
comparison_chunkDuring comparison LLM streaming{"delta": "<token>", "finish_reason": null | "stop"}
doneAll complete{"comparison_id", "total_latency_ms", "partial", "comparison_model", "comparison_fallback_used"}

Mode 2: Skip comparison (skip_comparison: true)

Each fan-out model streams tokens in real time concurrently, tagged by model name. No comparison LLM is called.

EventWhenPayload
metaImmediately after auth{"comparison_id", "models", "comparison_model": null, "skip_comparison": true}
model_chunkEach token from any model{"model": "...", "delta": "Hello", "finish_reason": null}
model_stream_doneOne model’s stream ends{"model": "...", "finish_reason": "stop", "usage": {...}, "error": null | "..."}
doneAll models finished{"comparison_id", "total_latency_ms", "partial", "skip_comparison": true}

SDK coverage

  • Node: client.compare.create(...)
  • Python: client.compare.create(...)
  • Go: client.Compare.Create(...)
  • Java: client.compare().create(...)

Streaming compare is also available in the SDKs through their compare stream methods when you want incremental events instead of a single final JSON response.

When to use compare

  • Evaluate multiple candidate models for a task
  • Compare cost/quality trade-offs before choosing a default
  • Build internal prompt evaluations with a stable request shape