Create a chat completion

View as Markdown
Generates a model response from a list of chat messages. This endpoint is intentionally shaped like OpenAI's `POST /v1/chat/completions` so existing SDK integrations can switch base URLs with minimal change. Mesh-specific request extensions: - `template`: resolve a stored prompt template by name or UUID - `variables`: values used to render `{{slot}}` placeholders - `session_id`: caller-defined grouping key for usage reporting Streaming responses are returned as Server-Sent Events when `stream=true`.

Authentication

AuthorizationBearer

Bearer authentication of the form Bearer <token>, where token is your auth token.

Request

This endpoint expects an object.
messageslist of objectsRequired
Conversation history in OpenAI chat format.
modelstring or nullOptionalDefaults to openai/gpt-4o
Model identifier. If omitted, the backend resolves it from the API key default model or the selected template.
templatestring or nullOptional
Template name or UUID to expand before inference.
variablesmap from strings to strings or nullOptional

Values used when rendering {{slot}} placeholders.

session_idstring or nullOptional

Caller-defined grouping key for usage reporting.

streambooleanOptionalDefaults to false
When true, returns SSE chunks instead of a JSON object.
temperaturedouble or nullOptional

Sampling temperature used for token selection. Lower values make output more deterministic; higher values increase randomness and variation.

max_tokensinteger or nullOptional>=1
Maximum number of completion tokens to generate.
top_pdouble or nullOptional

Nucleus sampling threshold. The model samples from the smallest set of tokens whose cumulative probability reaches top_p.

frequency_penaltydouble or nullOptional
Penalizes tokens that have already appeared in the generated output, reducing repeated phrasing.
presence_penaltydouble or nullOptional
Penalizes tokens that have already appeared at least once, nudging the model toward introducing new topics.
stopstring or list of strings or nullOptional
One or more stop sequences that end generation when encountered.
seedinteger or nullOptional

Optional seed for best-effort deterministic sampling across repeated requests with the same parameters.

toolslist of objects or nullOptional
Tool definitions the model may call during the completion.
tool_choiceenum or object or nullOptional
Controls whether the model can call tools automatically, must avoid them, must call one, or must call a specific tool.
transformslist of strings or nullOptional
Ordered OpenRouter transforms applied before inference.
modelslist of strings or nullOptional
Ordered OpenRouter fallback model list.
userstring or nullOptional<=256 characters

End-user identifier forwarded for abuse monitoring.

Response headers

X-Request-Idstring
Unique request identifier for tracing and support.

Response

Successful response.

Non-streaming requests return a ChatCompletionResponse object. Streaming requests return text/event-stream chunks matching ChatCompletionChunk, followed by data: [DONE].

idstring
object"chat.completion"
createdinteger
modelstring
choiceslist of objects
usageobject or null
system_fingerprintstring or null

Errors

401
Unauthorized Error
402
Payment Required Error
403
Forbidden Error
404
Not Found Error
422
Unprocessable Entity Error
429
Too Many Requests Error
502
Bad Gateway Error
503
Service Unavailable Error
504
Gateway Timeout Error