Generates a model response from a list of chat messages.
This endpoint is intentionally shaped like OpenAI's
`POST /v1/chat/completions` so existing SDK integrations can switch
base URLs with minimal change.
Mesh-specific request extensions:
- `template`: resolve a stored prompt template by name or UUID
- `variables`: values used to render `{{slot}}` placeholders
- `session_id`: caller-defined grouping key for usage reporting
Streaming responses are returned as Server-Sent Events when
`stream=true`.
Request
This endpoint expects an object.
messageslist of objectsRequired
Conversation history in OpenAI chat format.
modelstring or nullOptionalDefaults to openai/gpt-4o
Model identifier. If omitted, the backend resolves it from the API
key default model or the selected template.
templatestring or nullOptional
Template name or UUID to expand before inference.
variablesmap from strings to strings or nullOptional
Values used when rendering {{slot}} placeholders.
session_idstring or nullOptional
Caller-defined grouping key for usage reporting.
streambooleanOptionalDefaults to false
When true, returns SSE chunks instead of a JSON object.
temperaturedouble or nullOptional
Sampling temperature used for token selection. Lower values make output more deterministic; higher values increase randomness and variation.
max_tokensinteger or nullOptional>=1
Maximum number of completion tokens to generate.
top_pdouble or nullOptional
Nucleus sampling threshold. The model samples from the smallest set of tokens whose cumulative probability reaches top_p.
frequency_penaltydouble or nullOptional
Penalizes tokens that have already appeared in the generated output, reducing repeated phrasing.
presence_penaltydouble or nullOptional
Penalizes tokens that have already appeared at least once, nudging the model toward introducing new topics.
stopstring or list of strings or nullOptional
One or more stop sequences that end generation when encountered.
seedinteger or nullOptional
Optional seed for best-effort deterministic sampling across repeated requests with the same parameters.
toolslist of objects or nullOptional
Tool definitions the model may call during the completion.
tool_choiceenum or object or nullOptional
Controls whether the model can call tools automatically, must avoid them, must call one, or must call a specific tool.
transformslist of strings or nullOptional
Ordered OpenRouter transforms applied before inference.
modelslist of strings or nullOptional
Ordered OpenRouter fallback model list.
userstring or nullOptional<=256 characters
End-user identifier forwarded for abuse monitoring.
Response
Successful response.
Non-streaming requests return a ChatCompletionResponse object.
Streaming requests return text/event-stream chunks matching
ChatCompletionChunk, followed by data: [DONE].
system_fingerprintstring or null