Available Models

This page is auto-generated from GET /v1/models so the catalog stays aligned with the live Mesh API inventory.

Name	Model ID	Provider	Tier	Context	Input (USD)	Output (USD)	Description
AI21: Jamba 1.5 Large	`ai21/jamba-1-5-large-v1`	Ai21	Paid	262 K	0.002	0.008	No description available.
Full description No description available.
AI21: Jamba 1.5 Mini	`ai21/jamba-1-5-mini-v1`	Ai21	Paid	262 K	0.0002	0.0004	No description available.
Full description No description available.
AI21: Jamba Large 1.7	`ai21/jamba-large-1.7`	Ai21	Paid	256 K	0.002	0.008	Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding…
Full description Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall efficiency. Built on a hybrid SSM-Transformer architecture with a 256K context window, it delivers more accurate, contextually grounded responses and better steerability than previous versions.
AionLabs: Aion-1.0	`aion-labs/aion-1.0`	Aion Labs	Paid	131 K	0.004	0.008	Aion-1.0 is a multi-model system designed for high performance across various tasks, including r…
Full description Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree…
AionLabs: Aion-1.0-Mini	`aion-labs/aion-1.0-mini`	Aion Labs	Paid	131 K	0.0007	0.0014	Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for…
Full description Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, coding, and logic. It is a modified variant…
AionLabs: Aion-2.0	`aion-labs/aion-2.0`	Aion Labs	Paid	131 K	0.0008	0.0016	Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It…
Full description Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing tension, crises, and conflict into stories, making narratives feel more engaging. It also handles mature and darker themes with more nuance and depth.
AionLabs: Aion-RP 1.0 (8B)	`aion-labs/aion-rp-llama-3.1-8b`	Aion Labs	Paid	32.8 K	0.0008	0.0016	Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto b…
Full description Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto, where LLMs evaluate each other’s responses. It is a fine-tuned base model…
Amazon: Nova 2 Lite	`amazon/nova-2-lite-v1`	Amazon	Paid	1 M	0.00006	0.00024	Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process te…
Full description Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. Nova 2 Lite demonstrates standout capabilities in processing documents, extracting information from videos, generating code, providing accurate grounded answers, and automating multi-step agentic workflows.
Amazon: Nova Lite	`amazon/nova-lite-v1`	Amazon	Paid	3 K	0.00006	0.00024	No description available.
Full description No description available.
Amazon: Nova Micro	`amazon/nova-micro-v1`	Amazon	Paid	128 K	0.00004	0.00014	No description available.
Full description No description available.
Amazon: Nova Premier 1.0	`amazon/nova-premier-v1`	Amazon	Paid	1 M	0.0025	0.0125	Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning task…
Full description Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models.
Amazon: Nova Pro	`amazon/nova-pro-v1`	Amazon	Paid	3 K	0.0008	0.0032	No description available.
Full description No description available.
Anthropic: Claude 3.5 Haiku	`anthropic/claude-3.5-haiku`	Anthropic	Paid	2 K	0.0008	0.004	Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use…
Full description Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick response times that are essential for dynamic…
Anthropic: Claude Haiku 4.5	`anthropic/claude-haiku-4.5`	Anthropic	Paid	2 K	0.0008	0.004	Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intel…
Full description Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications. It introduces extended thinking to the Haiku line; enabling controllable reasoning depth, summarized or interleaved thought output, and tool-assisted workflows with full support for coding, bash, web search, and computer-use tools. Scoring >73% on SWE-bench Verified, Haiku 4.5 ranks among the world’s best coding models while maintaining exceptional responsiveness for sub-agents, parallelized execution, and scaled deployment.
Anthropic: Claude Opus 4	`anthropic/claude-opus-4`	Anthropic	Paid	2 K	0.015	0.075	Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sust…
Full description Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in…
Anthropic: Claude Opus 4.1	`anthropic/claude-opus-4.1`	Anthropic	Paid	2 K	0.015	0.075	Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performan…
Full description Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on SWE-bench Verified and shows notable gains…
Anthropic: Claude Opus 4.5	`anthropic/claude-opus-4.5`	Anthropic	Paid	2 K	0.015	0.075	Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineeri…
Full description Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon computer use. It offers strong multimodal capabilities, competitive performance across real-world coding and reasoning benchmarks, and improved robustness to prompt injection. The model is designed to operate efficiently across varied effort levels, enabling developers to trade off speed, depth, and token usage depending on task requirements. It comes with a new parameter to control token efficiency, which can be accessed using the OpenRouter Verbosity parameter with low, medium, or high. Opus 4.5 supports advanced tool use, extended context management, and coordinated multi-agent setups, making it well-suited for autonomous research, debugging, multi-step planning, and spreadsheet/browser manipulation. It delivers substantial gains in structured reasoning, execution reliability, and alignment compared to prior Opus generations, while reducing token overhead and improving performance on long-running tasks.
Anthropic: Claude Opus 4.6	`anthropic/claude-opus-4.6`	Anthropic	Paid	1 M	0.015	0.075	Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is bu…
Full description Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective for large codebases, complex refactors, and multi-step debugging that unfolds over time. The model shows deeper contextual understanding, stronger problem decomposition, and greater reliability on hard engineering tasks than prior generations. Beyond coding, Opus 4.6 excels at sustained knowledge work. It produces near-production-ready documents, plans, and analyses in a single pass, and maintains coherence across very long outputs and extended sessions. This makes it a strong default for tasks that require persistence, judgment, and follow-through, such as technical design, migration planning, and end-to-end project execution. For users upgrading from earlier Opus versions, see our official migration guide here
Anthropic: Claude Opus 4.7	`anthropic/claude-opus-4.7`	Anthropic	Paid	1 M	0.005	0.025	Anthropic’s latest flagship, now live on Mesh API. Opus 4.7 is a big step up on hard coding work…
Full description Anthropic’s latest flagship, now live on Mesh API. Opus 4.7 is a big step up on hard coding work, handling long-running agentic tasks with way more rigor and verifying its own outputs before handing them back. Better vision too, it reads high-res images at 3x the fidelity of older Claude models, so computer-use agents and diagram parsing actually work. Improved instruction following, stronger memory across sessions, and a new xhigh effort level for the really gnarly problems.
Anthropic: Claude Opus 4.8	`anthropic/claude-opus-4.8`	Anthropic	Paid	1 M	0.005	0.025	Anthropic’s latest flagship, now live on Mesh API. Opus 4.7 is a big step up on hard coding work…
Full description Anthropic’s latest flagship, now live on Mesh API. Opus 4.7 is a big step up on hard coding work, handling long-running agentic tasks with way more rigor and verifying its own outputs before handing them back. Better vision too, it reads high-res images at 3x the fidelity of older Claude models, so computer-use agents and diagram parsing actually work. Improved instruction following, stronger memory across sessions, and a new xhigh effort level for the really gnarly problems.
Anthropic: Claude Sonnet 4.5	`anthropic/claude-sonnet-4.5`	Anthropic	Paid	1 M	0.003	0.015	Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world ag…
Full description Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with improvements across system design, code security, and specification adherence. The model is designed for extended autonomous operation, maintaining task continuity across sessions and providing fact-based progress tracking. Sonnet 4.5 also introduces stronger agentic capabilities, including improved tool orchestration, speculative parallel execution, and more efficient context and memory management. With enhanced context tracking and awareness of token usage across tool calls, it is particularly well-suited for multi-context and long-running workflows. Use cases span software engineering, cybersecurity, financial analysis, research agents, and other domains requiring sustained reasoning and tool use.
Anthropic: Claude Sonnet 4.6	`anthropic/claude-sonnet-4.6`	Anthropic	Paid	1 M	0.003	0.015	Sonnet 4.6 is Anthropic’s most capable Sonnet-class model yet, with frontier performance across…
Full description Sonnet 4.6 is Anthropic’s most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with memory, polished document creation, and confident computer use for web QA and workflow automation.
Baidu: ERNIE 4.5 300B A47B	`baidu/ernie-4.5-300b-a47b`	Baidu	Paid	123 K	0.00028	0.0011	ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Bai…
Full description ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4.5 series. It activates 47B parameters per token and supports text generation in…
Baidu: ERNIE 4.5 VL 424B A47B	`baidu/ernie-4.5-vl-424b-a47b`	Baidu	Paid	123 K	0.00042	0.00125	ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 ser…
Full description ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data…
bedrock/amazon.titan-embed-g1-text-02	`bedrock/amazon.titan-embed-g1-text-02`	Bedrock	Paid	N/A	0.00002	0	No description available.
Full description No description available.
bedrock/amazon.titan-embed-text-v2:0	`bedrock/amazon.titan-embed-text-v2:0`	Bedrock	Paid	N/A	0.00002	0	No description available.
Full description No description available.
bedrock/cohere.embed-english-v3	`bedrock/cohere.embed-english-v3`	Bedrock	Paid	N/A	0.0001	0	No description available.
Full description No description available.
bedrock/cohere.embed-multilingual-v3	`bedrock/cohere.embed-multilingual-v3`	Bedrock	Paid	N/A	0.0001	0	No description available.
Full description No description available.
bedrock/cohere.embed-v4:0	`bedrock/cohere.embed-v4:0`	Bedrock	Paid	N/A	0.00012	0	No description available.
Full description No description available.
ByteDance Seed: Seed 1.6	`bytedance-seed/seed-1.6`	Bytedance Seed	Paid	262 K	0.00025	0.002	Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimo…
Full description Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context window.
ByteDance Seed: Seed 1.6 Flash	`bytedance-seed/seed-1.6-flash`	Bytedance Seed	Paid	262 K	0.000075	0.0003	Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting bot…
Full description Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context window and can generate outputs of up to 16k tokens.
ByteDance Seed: Seed-2.0-Lite	`bytedance-seed/seed-2.0-lite`	Bytedance Seed	Paid	262 K	0.00025	0.002	Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimoda…
Full description Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across text, vision, and tools. Engineered for high-frequency visual understanding and agentic workflows, it’s an ideal choice for deployment at scale with minimal latency.
ByteDance Seed: Seed-2.0-Mini	`bytedance-seed/seed-2.0-mini`	Bytedance Seed	Paid	262 K	0.0001	0.0004	Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasi…
Full description Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding, and is optimized for lightweight tasks where cost and speed take priority.
ByteDance: UI-TARS 7B	`bytedance/ui-tars-1.5-7b`	Bytedance	Paid	128 K	0.0001	0.0002	UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, includin…
Full description UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement…
Claude 3 Haiku	`anthropic/claude-3-haiku`	Anthropic	Paid	2 K	0.00025	0.00125	No description available.
Full description No description available.
Claude Sonnet 4	`anthropic/claude-sonnet-4`	Anthropic	Paid	2 K	0.003	0.015	No description available.
Full description No description available.
Cohere: Command A	`cohere/command-a`	Cohere	Paid	256 K	0.0025	0.01	Command A is an open-weights 111B parameter model with a 256k context window focused on deliveri…
Full description Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases. Compared to other leading proprietary…
Cohere: Command R (08-2024)	`cohere/command-r-08-2024`	Cohere	Paid	128 K	0.00015	0.0006	command-r-08-2024 is an update of the Command R with improved perfor…
Full description command-r-08-2024 is an update of the Command R with improved performance for multilingual retrieval-augmented generation (RAG) and tool use. More broadly, it is better at math, code and reasoning and…
Cohere: Command R+ (08-2024)	`cohere/command-r-plus-08-2024`	Cohere	Paid	128 K	0.0025	0.01	command-r-plus-08-2024 is an update of the Command R+ with roug…
Full description command-r-plus-08-2024 is an update of the Command R+ with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint…
Cohere: Command R7B (12-2024)	`cohere/command-r7b-12-2024`	Cohere	Paid	128 K	0.0000375	0.00015	Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 202…
Full description Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning…
Deep Cogito: Cogito v2.1 671B	`deepcogito/cogito-v2.1-671b`	Deepcogito	Paid	128 K	0.00125	0.00125	Cogito v2.1 671B MoE represents one of the strongest open models globally, matching performance…
Full description Cogito v2.1 671B MoE represents one of the strongest open models globally, matching performance of frontier closed and open models. This model is trained using self play with reinforcement learning to reach state-of-the-art performance on multiple categories (instruction following, coding, longer queries and creative writing). This advanced system demonstrates significant progress toward scalable superintelligence through policy improvement.
DeepSeek: DeepSeek V3 0324	`deepseek/deepseek-chat-v3-0324`	Deepseek	Paid	164 K	0.0002	0.00077	DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship…
Full description DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 model and performs really well…
DeepSeek: DeepSeek V3.1	`deepseek/deepseek-chat-v3.1`	Deepseek	Paid	32.8 K	0.00015	0.00075	DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both…
Full description DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. Learn more in our docs The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows. It succeeds the DeepSeek V3-0324 model and performs well on a variety of tasks.
DeepSeek: DeepSeek V3.1 Terminus	`deepseek/deepseek-v3.1-terminus`	Deepseek	Paid	164 K	0.00021	0.00079	DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that mainta…
Full description DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model’s original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model’s performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. Learn more in our docs The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.
DeepSeek: DeepSeek V3.2	`deepseek/deepseek-v3.2`	Deepseek	Paid	164 K	0.00062	0.00185	DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with…
Full description DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. Learn more in our docs
DeepSeek: DeepSeek V3.2 Exp	`deepseek/deepseek-v3.2-exp`	Deepseek	Paid	164 K	0.00027	0.00041	DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediat…
Full description DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectures. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve training and inference efficiency in long-context scenarios while maintaining output quality. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. Learn more in our docs The model was trained under conditions aligned with V3.1-Terminus to enable direct comparison. Benchmarking shows performance roughly on par with V3.1 across reasoning, coding, and agentic tool-use tasks, with minor tradeoffs and gains depending on the domain. This release focuses on validating architectural optimizations for extended context lengths rather than advancing raw task accuracy, making it primarily a research-oriented model for exploring efficient transformer designs.
DeepSeek: DeepSeek V4 Flash	`deepseek/deepseek-v4-flash`	Deepseek	Paid	1.05 M	0.00014	0.00028	DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B to…
Full description DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and high-throughput workloads, while maintaining strong reasoning and coding performance. The model includes hybrid attention for efficient long-context processing. Reasoning efforts high and xhigh are supported; xhigh maps to max reasoning. It is well suited for applications such as coding assistants, chat systems, and agent workflows where responsiveness and cost efficiency are important.
DeepSeek: DeepSeek V4 Pro	`deepseek/deepseek-v4-pro`	Deepseek	Paid	1.05 M	0.001392	0.002784	DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total paramete…
Full description DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context window. It is designed for advanced reasoning, coding, and long-horizon agent workflows, with strong performance across knowledge, math, and software engineering benchmarks. Built on the same architecture as DeepSeek V4 Flash, it introduces a hybrid attention system for efficient long-context processing. Reasoning efforts high and xhigh are supported; xhigh maps to max reasoning. It is well suited for complex workloads such as full-codebase analysis, multi-step automation, and large-scale information synthesis, where both capability and efficiency are critical.
DeepSeek: R1	`deepseek/deepseek-r1`	Deepseek	Paid	64 K	0.00135	0.0054	DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with…
Full description DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It’s 671B parameters in size, with 37B active in an inference pass…
EssentialAI: Rnj 1 Instruct	`essentialai/rnj-1-instruct`	Essentialai	Paid	32.8 K	0.00015	0.00015	Rnj-1 is an 8B-parameter, dense, open-weight model family developed by Essential AI and trained…
Full description Rnj-1 is an 8B-parameter, dense, open-weight model family developed by Essential AI and trained from scratch with a focus on programming, math, and scientific reasoning. The model demonstrates strong performance across multiple programming languages, tool-use workflows, and agentic execution environments (e.g., mini-SWE-agent).
Gemini 3 1 Flash Lite	`google/gemini-3.1-flash-lite`	Google	Paid	N/A	0.00025	0.0015	No description available.
Full description No description available.
Gemini 3 1 Flash Lite Preview	`google/gemini-3.1-flash-lite-preview`	Google	Paid	1.05 M	0.00025	0.0015	Gemini 3.1 Flash Lite Preview is Google’s high-efficiency model optimized for high-volume use ca…
Full description Gemini 3.1 Flash Lite Preview is Google’s high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across key capabilities. Improvements span audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. Supports full thinking levels (minimal, low, medium, high) for fine-grained cost/performance trade-offs. Priced at half the cost of Gemini 3 Flash.
Gemini 3 1 Pro Preview	`google/gemini-3.1-pro-preview`	Google	Paid	1.05 M	0.002	0.012	Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engine…
Full description Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation of the Gemini 3 series, it combines high-precision reasoning across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning. The 3.1 update introduces measurable gains in SWE benchmarks and real-world coding environments, along with stronger autonomous task execution in structured domains such as finance and spreadsheet-based workflows. Designed for advanced development and agentic systems, Gemini 3.1 Pro Preview improves long-horizon stability and tool orchestration while increasing token efficiency. It introduces a new medium thinking level to better balance cost, speed, and performance. The model excels in agentic coding, structured planning, multimodal analysis, and workflow automation, making it well-suited for autonomous agents, financial modeling, spreadsheet automation, and high-context enterprise tasks.
Gemini 3 Flash Preview	`google/gemini-3-flash-preview`	Google	Paid	1.05 M	0.0005	0.003	Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows…
Full description Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants, making it well suited for interactive development, long running agent loops, and collaborative coding tasks. Compared to Gemini 2.5 Flash, it provides broad quality improvements across reasoning, multimodal understanding, and reliability. The model supports a 1M token context window and multimodal inputs including text, images, audio, video, and PDFs, with text output. It includes configurable reasoning via thinking levels (minimal, low, medium, high), structured output, tool use, and automatic context caching. Gemini 3 Flash Preview is optimized for users who want strong reasoning and agentic behavior without the cost or latency of full scale frontier models.
Glm 4 7	`zai/glm-4-7`	Zai	Paid	131 K	0.0006	0.0022	No description available.
Full description No description available.
Glm 4 7 Flash	`zai/glm-4-7-flash`	Zai	Paid	131 K	0.00007	0.0004	No description available.
Full description No description available.
Google: Gemini 2.0 Flash Lite	`google/gemini-2.0-flash-lite-001`	Google	Paid	1.05 M	0.000075	0.0003	Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemi…
Full description Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5,…
Google: Gemini 2.5 Flash Lite Preview 09-2025	`google/gemini-2.5-flash-lite-preview-09-2025`	Google	Paid	1.05 M	0.0001	0.0004	Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for u…
Full description Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, “thinking” (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the Reasoning API parameter to selectively trade off cost for intelligence.
Google: Gemini 2.5 Pro Preview 05-06	`google/gemini-2.5-pro-preview-05-06`	Google	Paid	1.05 M	0.00125	0.01	Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, ma…
Full description Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy…
Google: Gemini 2.5 Pro Preview 06-05	`google/gemini-2.5-pro-preview`	Google	Paid	1.05 M	0.00125	0.01	Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, ma…
Full description Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy…
Google: Gemma 2 27B	`google/gemma-2-27b-it`	Google	Paid	8.19 K	0.00065	0.00065	Gemma 2 27B by Google is an open model built from the same research and technology used to creat…
Full description Gemma 2 27B by Google is an open model built from the same research and technology used to create the Gemini models. Gemma models are well-suited for a variety of…
Google: Gemma 3 12B	`google/gemma-3-12b-it`	Google	Paid	131 K	0.00009	0.00029	Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles…
Full description Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,…
Google: Gemma 3 27B	`google/gemma-3-27b-it`	Google	Paid	131 K	0.00008	0.00016	Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles…
Full description Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,…
Google: Gemma 3 4B	`google/gemma-3-4b-it`	Google	Paid	131 K	0.00004	0.00008	Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles…
Full description Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,…
Google: Gemma 3n 4B	`google/gemma-3n-e4b-it`	Google	Paid	32.8 K	0.00002	0.00004	Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as…
Full description Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks…
Google: Gemma 4 26B A4B	`google/gemma-4-26b-a4b-it`	Google	Paid	262 K	0.00013	0.0004	Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind…
Full description Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at…
Google: Nano Banana (Gemini 2.5 Flash Image)	`google/gemini-2.5-flash-image`	Google	Paid	32.8 K	0.0003	0.03	Gemini 2.5 Flash Image, a.k.a. “Nano Banana,” is now generally available. It is a state of the a…
Full description Gemini 2.5 Flash Image, a.k.a. “Nano Banana,” is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations. Aspect ratios can be controlled with the image_config API Parameter
Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)	`google/gemini-3.1-flash-image-preview`	Google	Paid	65.5 K	0.0005	0.049585	Gemini 3.1 Flash Image Preview, a.k.a. “Nano Banana 2,” is Google’s latest state of the art imag…
Full description Gemini 3.1 Flash Image Preview, a.k.a. “Nano Banana 2,” is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines advanced contextual understanding with fast, cost-efficient inference, making complex image generation and iterative edits significantly more accessible. Aspect ratios can be controlled with the image_config API Parameter
Gpt 3 5 Turbo 0125	`openai/gpt-3.5-turbo-0125`	Openai	Paid	16.4 K	0.0005	0.0015	No description available.
Full description No description available.
Gpt 3 5 Turbo 1106	`openai/gpt-3.5-turbo-1106`	Openai	Paid	16.4 K	0.001	0.002	No description available.
Full description No description available.
Gpt 4 0613	`openai/gpt-4-0613`	Openai	Paid	8.19 K	0.03	0.06	No description available.
Full description No description available.
Gpt 4 Turbo 2024 04 09	`openai/gpt-4-turbo-2024-04-09`	Openai	Paid	131 K	0.01	0.03	No description available.
Full description No description available.
Gpt 5 4 Image	`openai/gpt-5.4-image`	Openai	Paid	N/A	0.008	0.015	No description available.
Full description No description available.
Gpt 5 4 Image 2	`openai/gpt-5.4-image-2`	Openai	Paid	N/A	0.008	0.015	No description available.
Full description No description available.
Gpt 5 4 Image Mini	`openai/gpt-5.4-image-mini`	Openai	Paid	N/A	0.008	0.015	No description available.
Full description No description available.
Gpt 5 Mini	`openai/gpt-5-mini`	Openai	Paid	131 K	0.00025	0.002	No description available.
Full description No description available.
Gpt 5 Nano	`openai/gpt-5-nano`	Openai	Paid	131 K	0.00005	0.0004	No description available.
Full description No description available.
Gpt Audio	`openai/gpt-audio`	Openai	Paid	N/A	0.0025	0.01	No description available.
Full description No description available.
Gpt Audio 1 5	`openai/gpt-audio-1.5`	Openai	Paid	N/A	0.0025	0.01	No description available.
Full description No description available.
Gpt Audio Mini	`openai/gpt-audio-mini`	Openai	Paid	N/A	0.0006	0.0024	No description available.
Full description No description available.
Gpt Image 1	`openai/gpt-image-1`	Openai	Paid	N/A	0.005	0	No description available.
Full description No description available.
Gpt Image 1 5	`openai/gpt-image-1.5`	Openai	Paid	N/A	0.005	0.01	No description available.
Full description No description available.
Gpt Image 1 Mini	`openai/gpt-image-1-mini`	Openai	Paid	N/A	0.002	0	No description available.
Full description No description available.
Gpt Image 2	`openai/gpt-image-2`	Openai	Paid	N/A	0.005	0	No description available.
Full description No description available.
Gpt Oss 120B	`openai/gpt-oss-120b`	Openai	Paid	131 K	0.00015	0.0006	No description available.
Full description No description available.
GPT Realtime 1.5	`openai/gpt-realtime-1.5`	Openai	Paid	N/A	0.004	0.016	OpenAI GPT Realtime 1.5 — speech-to-speech real-time model with text, audio, and image input.
Full description OpenAI GPT Realtime 1.5 — speech-to-speech real-time model with text, audio, and image input.
GPT Realtime 2	`openai/gpt-realtime-2`	Openai	Paid	N/A	0.004	0.024	OpenAI GPT Realtime 2 — speech-to-speech real-time model with text, audio, and image input.
Full description OpenAI GPT Realtime 2 — speech-to-speech real-time model with text, audio, and image input.
GPT Realtime Mini	`openai/gpt-realtime-mini`	Openai	Paid	N/A	0.0006	0.0024	OpenAI GPT Realtime Mini — cost-efficient speech-to-speech real-time model.
Full description OpenAI GPT Realtime Mini — cost-efficient speech-to-speech real-time model.
GPT Realtime Translate	`openai/gpt-realtime-translate`	Openai	Paid	N/A	Pricing unavailable	Pricing unavailable	OpenAI GPT Realtime Translate — real-time audio translation, billed per minute of output audio.
Full description OpenAI GPT Realtime Translate — real-time audio translation, billed per minute of output audio.
GPT Realtime Whisper	`openai/gpt-realtime-whisper`	Openai	Paid	N/A	Pricing unavailable	Pricing unavailable	OpenAI GPT Realtime Whisper — real-time audio transcription, billed per minute of input audio.
Full description OpenAI GPT Realtime Whisper — real-time audio transcription, billed per minute of input audio.
GPT-5-mini	`gpt-5-mini`	Unknown	Paid	4 K	0.00025	0.002	GPT-5 mini is a faster, more cost-efficient version of GPT-5. It’s great for well-defined tasks…
Full description GPT-5 mini is a faster, more cost-efficient version of GPT-5. It’s great for well-defined tasks and precise prompts.
GPT-5.5	`openai/gpt-5.5`	Openai	Paid	1.05 M	0.005	0.03	GPT-5.5 is OpenAI’s newest frontier model for the most complex professional work. Reasoning.effo…
Full description GPT-5.5 is OpenAI’s newest frontier model for the most complex professional work. Reasoning.effort supports: none, low, medium (default), high and xhigh.
Grok 4.3	`x-ai/grok-4.3`	X Ai	Paid	N/A	0.003	0.015	No description available.
Full description No description available.
IBM: Granite 4.0 Micro	`ibm-granite/granite-4.0-h-micro`	Ibm Granite	Paid	131 K	0.000017	0.00011	Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the…
Full description Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the latest in a series of models released by IBM. They are fine-tuned for long context tool calling.
Imagen 3	`google/imagen-3`	Google	Paid	N/A	Pricing unavailable	Pricing unavailable	No description available.
Full description No description available.
Imagen 3 Fast	`google/imagen-3-fast`	Google	Paid	N/A	Pricing unavailable	Pricing unavailable	No description available.
Full description No description available.
Imagen 3 V1	`google/imagen-3-v1`	Google	Paid	N/A	Pricing unavailable	Pricing unavailable	No description available.
Full description No description available.
Imagen 4	`google/imagen-4`	Google	Paid	N/A	Pricing unavailable	Pricing unavailable	No description available.
Full description No description available.
Imagen 4 Fast	`google/imagen-4-fast`	Google	Paid	N/A	Pricing unavailable	Pricing unavailable	No description available.
Full description No description available.
Imagen 4 Ultra	`google/imagen-4-ultra`	Google	Paid	N/A	Pricing unavailable	Pricing unavailable	No description available.
Full description No description available.
Inception: Mercury 2	`inception/mercury-2`	Inception	Paid	128 K	0.00025	0.00075	Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Inst…
Full description Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving…
Inflection: Inflection 3 Productivity	`inflection/inflection-3-productivity`	Inflection	Paid	8 K	0.0025	0.01	Inflection 3 Productivity is optimized for following instructions. It is better for tasks requir…
Full description Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. It has access to recent news. For emotional…
Kwaipilot: KAT-Coder-Pro V2	`kwaipilot/kat-coder-pro-v2`	Kwaipilot	Paid	256 K	0.0003	0.0012	KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed fo…
Full description KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed for complex enterprise-grade software engineering and SaaS integration. It builds on the agentic coding strengths of earlier versions, with a focus on large-scale production environments, multi-system coordination, and seamless integration across modern software stacks, while also supporting web aesthetics generation to produce production-grade landing pages and presentation decks.
Magnum v4 72B	`anthracite-org/magnum-v4-72b`	Anthracite Org	Paid	16.4 K	0.003	0.005	This is a series of models designed to replicate the prose quality of the Claude 3 models, speci…
Full description This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet) and Opus(https://openrouter.ai/anthropic/claude-3-opus). The model is fine-tuned on top of Qwen2.5 72B.
Mancer: Weaver (alpha)	`mancer/weaver`	Mancer	Paid	8 K	0.00075	0.001	An attempt to recreate Claude-style verbosity, but don’t expect the same level of coherence or m…
Full description An attempt to recreate Claude-style verbosity, but don’t expect the same level of coherence or memory. Meant for use in roleplay/narrative situations.
Meta: Llama 3 70B Instruct	`meta-llama/llama-3-70b-instruct`	Meta Llama	Paid	8.19 K	0.00072	0.00072	Meta’s latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B inst…
Full description Meta’s latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong…
Meta: Llama 3 8B Instruct	`meta-llama/llama-3-8b-instruct`	Meta Llama	Paid	8.19 K	0.0003	0.0006	Meta’s latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instr…
Full description Meta’s latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong…
Meta: Llama 3.1 70B Instruct	`meta-llama/llama-3.1-70b-instruct`	Meta Llama	Paid	131 K	0.00072	0.00072	Meta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B in…
Full description Meta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong…
Meta: Llama 3.1 8B Instruct	`meta-llama/llama-3.1-8b-instruct`	Meta Llama	Paid	16.4 K	0.0002	0.0002	Meta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B ins…
Full description Meta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to…
Meta: Llama 3.2 11B Vision Instruct	`meta-llama/llama-3.2-11b-vision-instruct`	Meta Llama	Paid	131 K	0.000049	0.000049	Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks…
Full description Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and…
Meta: Llama 3.2 1B Instruct	`meta-llama/llama-3.2-1b-instruct`	Meta Llama	Paid	60 K	0.000027	0.0002	Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural l…
Full description Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate…
Meta: Llama 3.2 3B Instruct	`meta-llama/llama-3.2-3b-instruct`	Meta Llama	Paid	80 K	0.000051	0.00034	Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced…
Full description Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it…
Meta: Llama 3.3 70B Instruct	`meta-llama/llama-3.3-70b-instruct`	Meta Llama	Paid	131 K	0.00072	0.00072	The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned…
Full description The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model…
Meta: Llama 4 Maverick	`meta-llama/llama-4-maverick`	Meta Llama	Paid	1.05 M	0.00024	0.00097	Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, bui…
Full description Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward…
Meta: Llama 4 Scout	`meta-llama/llama-4-scout`	Meta Llama	Paid	328 K	0.00017	0.00017	Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta,…
Full description Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input…
Meta: Llama Guard 4 12B	`meta-llama/llama-guard-4-12b`	Meta Llama	Paid	164 K	0.00018	0.00018	Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content saf…
Full description Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM…
Microsoft: Phi 4	`microsoft/phi-4`	Microsoft	Paid	16.4 K	0.000065	0.00014	Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks an…
Full description Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion…
Minimax M2	`minimax/minimax-m2`	Minimax	Paid	197 K	0.0003	0.0012	No description available.
Full description No description available.
Minimax M2 1	`minimax/minimax-m2-1`	Minimax	Paid	197 K	0.0003	0.0012	No description available.
Full description No description available.
Minimax M2 5	`minimax/minimax-m2-5`	Minimax	Paid	197 K	0.0003	0.0012	No description available.
Full description No description available.
MiniMax: MiniMax M2-her	`minimax/minimax-m2-her`	Minimax	Paid	65.5 K	0.0003	0.0012	MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-…
Full description MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn conversations. Designed to stay consistent in tone and personality, it supports rich message roles (user_system, group, sample_message_user, sample_message_ai) and can learn from example dialogue to better match the style and pacing of your scenario, making it a strong choice for storytelling, companions, and conversational experiences where natural flow and vivid interaction matter most.
MiniMax: MiniMax-01	`minimax/minimax-01`	Minimax	Paid	1 M	0.0002	0.0011	MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image underst…
Full description MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context…
Mistral Large	`mistralai/mistral-large`	Mistralai	Paid	128 K	0.002	0.006	This is Mistral AI’s flagship model, Mistral Large 2 (version `mistral-large-2407`). It’s a prop…
Full description This is Mistral AI’s flagship model, Mistral Large 2 (version `mistral-large-2407`). It’s a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement here…
Mistral Large 2407	`mistralai/mistral-large-2407`	Mistralai	Paid	131 K	0.002	0.006	This is Mistral AI’s flagship model, Mistral Large 2 (version mistral-large-2407). It’s a propri…
Full description This is Mistral AI’s flagship model, Mistral Large 2 (version mistral-large-2407). It’s a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement here…
Mistral Large 2411	`mistralai/mistral-large-2411`	Mistralai	Paid	131 K	0.002	0.006	Mistral Large 2 2411 is an update of Mistral Large 2 released togeth…
Full description Mistral Large 2 2411 is an update of Mistral Large 2 released together with Pixtral Large 2411 It provides a significant upgrade on the previous Mistral Large 24.07, with notable…
Mistral: 7B Instruct (legacy)	`mistral/mistral-7b-instruct-v0`	Mistral	Paid	32.8 K	0.00015	0.0002	No description available.
Full description No description available.
Mistral: Codestral 2508	`mistralai/codestral-2508`	Mistralai	Paid	256 K	0.0003	0.0009	Mistral’s cutting-edge language model for coding released end of July 2025. Codestral specialize…
Full description Mistral’s cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation. Blog Post
Mistral: Devstral 2 123B	`mistral/devstral-2-123b`	Mistral	Paid	262 K	0.0004	0.002	No description available.
Full description No description available.
Mistral: Devstral 2 2512	`mistralai/devstral-2512`	Mistralai	Paid	262 K	0.0004	0.002	Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding…
Full description Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context. It tracks framework dependencies, detects failures, and retries with corrections—solving challenges like bug fixing and modernizing legacy systems. The model can be fine-tuned to prioritize specific languages or optimize for large enterprise codebases. It is available under a modified MIT license.
Mistral: Devstral Medium	`mistralai/devstral-medium`	Mistralai	Paid	131 K	0.0004	0.002	Devstral Medium is a high-performance code generation and agentic reasoning model developed join…
Full description Devstral Medium is a high-performance code generation and agentic reasoning model developed jointly by Mistral AI and All Hands AI. Positioned as a step up from Devstral Small, it achieves…
Mistral: Devstral Small 1.1	`mistralai/devstral-small`	Mistralai	Paid	131 K	0.0001	0.0003	Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents…
Full description Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Finetuned from Mistral Small 3.1 and…
Mistral: Large 2402 (legacy)	`mistral/mistral-large-2402-v1`	Mistral	Paid	32.8 K	0.004	0.012	No description available.
Full description No description available.
Mistral: Large 3 675B	`mistral/mistral-large-3-675b-instruct`	Mistral	Paid	262 K	0.0005	0.0015	No description available.
Full description No description available.
Mistral: Magistral Small 2509	`mistral/magistral-small-2509`	Mistral	Paid	131 K	0.0005	0.0015	No description available.
Full description No description available.
Mistral: Ministral 3 14B	`mistral/ministral-3-14b-instruct`	Mistral	Paid	131 K	0.0002	0.0002	No description available.
Full description No description available.
Mistral: Ministral 3 14B 2512	`mistralai/ministral-14b-2512`	Mistralai	Paid	262 K	0.0002	0.0002	The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and pe…
Full description The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.
Mistral: Ministral 3 3B	`mistral/ministral-3-3b-instruct`	Mistral	Paid	131 K	0.0001	0.0001	No description available.
Full description No description available.
Mistral: Ministral 3 3B 2512	`mistralai/ministral-3b-2512`	Mistralai	Paid	131 K	0.0001	0.0001	The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny langu…
Full description The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.
Mistral: Ministral 3 8B	`mistral/ministral-3-8b-instruct`	Mistral	Paid	131 K	0.00015	0.00015	No description available.
Full description No description available.
Mistral: Ministral 3 8B 2512	`mistralai/ministral-8b-2512`	Mistralai	Paid	262 K	0.00015	0.00015	A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny languag…
Full description A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.
Mistral: Mistral 7B Instruct v0.1	`mistralai/mistral-7b-instruct-v0.1`	Mistralai	Paid	2.82 K	0.00011	0.00019	A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for sp…
Full description A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.
Mistral: Mistral Large 3 2512	`mistralai/mistral-large-2512`	Mistralai	Paid	262 K	0.0005	0.0015	Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-expe…
Full description Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total), and released under the Apache 2.0 license.
Mistral: Mistral Medium 3	`mistralai/mistral-medium-3`	Mistralai	Paid	131 K	0.0004	0.002	Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver front…
Full description Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost…
Mistral: Mistral Medium 3.1	`mistralai/mistral-medium-3.1`	Mistralai	Paid	131 K	0.0004	0.002	Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterp…
Full description Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases. The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3.1 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments.
Mistral: Mistral Nemo	`mistralai/mistral-nemo`	Mistralai	Paid	131 K	0.00002	0.00004	A 12B parameter model with a 128k token context length built by Mistral in collaboration with NV…
Full description A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese,…
Mistral: Mistral Small 3	`mistralai/mistral-small-24b-instruct-2501`	Mistralai	Paid	32.8 K	0.00005	0.00008	Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across c…
Full description Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed…
Mistral: Mistral Small 3.1 24B	`mistralai/mistral-small-3.1-24b-instruct`	Mistralai	Paid	131 K	0.00003	0.00011	Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 bi…
Full description Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and…
Mistral: Mistral Small 3.2 24B	`mistralai/mistral-small-3.2-24b-instruct`	Mistralai	Paid	128 K	0.000075	0.0002	Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for…
Full description Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on…
Mistral: Mistral Small 4	`mistralai/mistral-small-2603`	Mistralai	Paid	262 K	0.00015	0.0006	Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities…
Full description Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from Magistral, multimodal understanding from Pixtral, and agentic coding capabilities from Devstral, enabling one model to handle complex analysis, software development, and visual tasks within the same workflow.
Mistral: Mixtral 8x22B Instruct	`mistralai/mixtral-8x22b-instruct`	Mistralai	Paid	65.5 K	0.002	0.006	Mistral’s official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22…
Full description Mistral’s official instruct fine-tuned version of Mixtral 8x22B. It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding,…
Mistral: Pixtral Large (2502)	`mistral/pixtral-large-2502-v1`	Mistral	Paid	131 K	0.002	0.006	No description available.
Full description No description available.
Mistral: Pixtral Large 2411	`mistralai/pixtral-large-2411`	Mistralai	Paid	131 K	0.002	0.006	Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large…
Full description Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of Mistral Large 2. The model is able to understand documents, charts and natural images. The model is…
Mistral: Saba	`mistralai/mistral-saba`	Mistralai	Paid	32.8 K	0.0002	0.0006	Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and Sou…
Full description Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional…
Mistral: Small 2402 (legacy)	`mistral/mistral-small-2402-v1`	Mistral	Paid	32.8 K	0.001	0.003	No description available.
Full description No description available.
Mistral: Voxtral Mini 3B	`mistral/voxtral-mini-3b-2507`	Mistral	Paid	32.8 K	0.00004	0.00004	No description available.
Full description No description available.
Mistral: Voxtral Small 24B 2507	`mistralai/voxtral-small-24b-2507`	Mistralai	Paid	32 K	0.0001	0.0003	Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input c…
Full description Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Input audio is priced at $100 per million seconds.
MoonshotAI: Kimi K2 0711	`moonshotai/kimi-k2`	Moonshotai	Paid	131 K	0.00057	0.0023	Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot…
Full description Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for…
MoonshotAI: Kimi K2 0905	`moonshotai/kimi-k2-0905`	Moonshotai	Paid	131 K	0.0004	0.002	Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale…
Full description Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k. This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training.
MoonshotAI: Kimi K2 Thinking	`moonshotai/kimi-k2-thinking`	Moonshotai	Paid	131 K	0.0006	0.0025	Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 s…
Full description Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in…
MoonshotAI: Kimi K2.5	`moonshotai/kimi-k2.5`	Moonshotai	Paid	262 K	0.0006	0.003	Kimi K2.5 is Moonshot AI’s native multimodal model, delivering state-of-the-art visual coding ca…
Full description Kimi K2.5 is Moonshot AI’s native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed…
Morph: Morph V3 Fast	`morph/morph-v3-fast`	Morph	Paid	81.9 K	0.0008	0.0012	Morph’s fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code…
Full description Morph’s fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code> <update>{edit_snippet}</update>…
Morph: Morph V3 Large	`morph/morph-v3-large`	Morph	Paid	262 K	0.0009	0.0019	Morph’s high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy fo…
Full description Morph’s high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy for precise code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code>…
MythoMax 13B	`gryphe/mythomax-l2-13b`	Gryphe	Paid	4.1 K	0.00006	0.00006	One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions…
Full description One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
Nous: Hermes 3 405B Instruct	`nousresearch/hermes-3-llama-3.1-405b`	Nousresearch	Paid	131 K	0.001	0.001	Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced…
Full description Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the…
Nous: Hermes 3 70B Instruct	`nousresearch/hermes-3-llama-3.1-70b`	Nousresearch	Paid	131 K	0.0003	0.0003	Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresea…
Full description Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the…
Nous: Hermes 4 405B	`nousresearch/hermes-4-405b`	Nousresearch	Paid	131 K	0.001	0.003	Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Rese…
Full description Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with <think>…</think> traces or respond directly, offering flexibility between speed and depth. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. Learn more in our docs The model is instruction-tuned with an expanded post-training corpus (~60B tokens) emphasizing reasoning traces, improving performance in math, code, STEM, and logical reasoning, while retaining broad assistant utility. It also supports structured outputs, including JSON mode, schema adherence, function calling, and tool use. Hermes 4 is trained for steerability, lower refusal rates, and alignment toward neutral, user-directed behavior.
Nous: Hermes 4 70B	`nousresearch/hermes-4-70b`	Nousresearch	Paid	131 K	0.00013	0.0004	Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It int…
Full description Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either respond directly or generate explicit <think>…</think> reasoning traces before answering. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. Learn more in our docs This 70B variant is trained with the expanded post-training corpus (~60B tokens) emphasizing verified reasoning data, leading to improvements in mathematics, coding, STEM, logic, and structured outputs while maintaining general assistant performance. It supports JSON mode, schema adherence, function calling, and tool use, and is designed for greater steerability with reduced refusal rates.
NVIDIA: Nemotron 3 Nano 30B A3B	`nvidia/nemotron-3-nano-30b-a3b`	Nvidia	Paid	262 K	0.00006	0.00024	NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and…
Full description NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully open with open-weights, datasets and recipes so developers can easily customize, optimize, and deploy the model on their infrastructure for maximum privacy and security.
NVIDIA: Nemotron 3 Super	`nvidia/nemotron-3-super-120b-a12b`	Nvidia	Paid	262 K	0.0001	0.0005	NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameter…
Full description NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Multi-environment RL training across 10+ environments delivers leading accuracy on benchmarks including AIME 2025, TerminalBench, and SWE-Bench Verified. Fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super allows easy customization and secure deployment anywhere — from workstation to cloud.
O1	`openai/o1`	Openai	Paid	2 K	0.015	0.06	No description available.
Full description No description available.
O3	`openai/o3`	Openai	Paid	2 K	0.002	0.008	No description available.
Full description No description available.
O3 Mini	`openai/o3-mini`	Openai	Paid	2 K	0.0011	0.0044	No description available.
Full description No description available.
O4 Mini	`openai/o4-mini`	Openai	Paid	2 K	0.0011	0.0044	No description available.
Full description No description available.
openai/gpt-5-pro	`openai/gpt-5-pro`	Openai	Paid	4 K	0.015	0.12	High-compute version of GPT-5 for complex reasoning tasks
Full description High-compute version of GPT-5 for complex reasoning tasks
openai/gpt-5.5-pro	`openai/gpt-5.5-pro`	Openai	Paid	N/A	0.03	0.18	No description available.
Full description No description available.
openai/text-embedding-3-large	`openai/text-embedding-3-large`	Openai	Paid	N/A	0.00013	0	No description available.
Full description No description available.
openai/text-embedding-3-small	`openai/text-embedding-3-small`	Openai	Paid	N/A	0.00002	0	No description available.
Full description No description available.
openai/text-embedding-ada-002	`openai/text-embedding-ada-002`	Openai	Paid	N/A	0.0001	0	No description available.
Full description No description available.
OpenAI: GPT-3.5 Turbo	`openai/gpt-3.5-turbo`	Openai	Paid	16.4 K	0.0005	0.0015	GPT-3.5 Turbo is OpenAI’s fastest model. It can understand and generate natural language or code…
Full description GPT-3.5 Turbo is OpenAI’s fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.
OpenAI: GPT-3.5 Turbo (older v0613)	`openai/gpt-3.5-turbo-0613`	Openai	Paid	4.09 K	0.001	0.002	GPT-3.5 Turbo is OpenAI’s fastest model. It can understand and generate natural language or code…
Full description GPT-3.5 Turbo is OpenAI’s fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.
OpenAI: GPT-3.5 Turbo 16k	`openai/gpt-3.5-turbo-16k`	Openai	Paid	16.4 K	0.003	0.004	This model offers four times the context length of gpt-3.5-turbo, allowing it to support approxi…
Full description This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Training data: up…
OpenAI: GPT-3.5 Turbo Instruct	`openai/gpt-3.5-turbo-instruct`	Openai	Paid	4.09 K	0.0015	0.002	This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-relat…
Full description This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Sep 2021.
OpenAI: GPT-4	`openai/gpt-4`	Openai	Paid	8.19 K	0.03	0.06	OpenAI’s flagship model, GPT-4 is a large-scale multimodal language model capable of solving dif…
Full description OpenAI’s flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due to its broader general knowledge and advanced reasoning…
OpenAI: GPT-4 Turbo	`openai/gpt-4-turbo`	Openai	Paid	128 K	0.01	0.03	The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and…
Full description The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to December 2023.
OpenAI: GPT-4.1	`openai/gpt-4.1`	Openai	Paid	1.05 M	0.002	0.008	GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-wo…
Full description GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and…
OpenAI: GPT-4.1 Mini	`openai/gpt-4.1-mini`	Openai	Paid	1.05 M	0.0004	0.0016	GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantiall…
Full description GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard…
OpenAI: GPT-4.1 Nano	`openai/gpt-4.1-nano`	Openai	Paid	1.05 M	0.0001	0.0004	For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1…
Full description For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million…
OpenAI: GPT-4o	`openai/gpt-4o`	Openai	Paid	128 K	0.0025	0.01	GPT-4o (“o” for “omni”) is OpenAI’s latest AI model, supporting both text and image inputs with…
Full description GPT-4o (“o” for “omni”) is OpenAI’s latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of GPT-4 Turbo while being twice as…
OpenAI: GPT-4o (2024-05-13)	`openai/gpt-4o-2024-05-13`	Openai	Paid	128 K	0.005	0.015	GPT-4o (“o” for “omni”) is OpenAI’s latest AI model, supporting both text and image inputs with…
Full description GPT-4o (“o” for “omni”) is OpenAI’s latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of GPT-4 Turbo while being twice as…
OpenAI: GPT-4o (2024-08-06)	`openai/gpt-4o-2024-08-06`	Openai	Paid	128 K	0.0025	0.01	The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the abi…
Full description The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more here. GPT-4o (“o” for “omni”) is…
OpenAI: GPT-4o (2024-11-20)	`openai/gpt-4o-2024-11-20`	Openai	Paid	128 K	0.0025	0.01	The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural,…
Full description The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded…
OpenAI: GPT-4o Search Preview	`openai/gpt-4o-search-preview`	Openai	Paid	128 K	0.0025	0.01	GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to…
Full description GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries.
OpenAI: GPT-4o-mini	`openai/gpt-4o-mini`	Openai	Paid	128 K	0.00015	0.0006	GPT-4o mini is OpenAI’s newest model after GPT-4 Omni, supporting both…
Full description GPT-4o mini is OpenAI’s newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable…
OpenAI: GPT-4o-mini (2024-07-18)	`openai/gpt-4o-mini-2024-07-18`	Openai	Paid	128 K	0.00015	0.0006	GPT-4o mini is OpenAI’s newest model after GPT-4 Omni, supporting both…
Full description GPT-4o mini is OpenAI’s newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable…
OpenAI: GPT-4o-mini Search Preview	`openai/gpt-4o-mini-search-preview`	Openai	Paid	128 K	0.00015	0.0006	GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trai…
Full description GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries.
OpenAI: GPT-5 Chat	`openai/gpt-5-chat`	Openai	Paid	128 K	0.00125	0.01	GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for en…
Full description GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications.
OpenAI: GPT-5.1	`openai/gpt-5.1`	Openai	Paid	4 K	0.00125	0.01	GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpos…
Full description GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. The model produces clearer, more grounded explanations with reduced jargon, making it easier to follow even on technical or multi-step problems. Built for broad task coverage, GPT-5.1 delivers consistent gains across math, coding, and structured analysis workloads, with more coherent long-form answers and improved tool-use reliability. It also features refined conversational alignment, enabling warmer, more intuitive responses without compromising precision. GPT-5.1 serves as the primary full-capability successor to GPT-5
OpenAI: GPT-5.1 Chat	`openai/gpt-5.1-chat`	Openai	Paid	128 K	0.00125	0.01	GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-l…
Full description GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.1 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation.
OpenAI: GPT-5.1-Codex	`openai/gpt-5.1-codex`	Openai	Paid	4 K	0.00125	0.01	GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding…
Full description GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the docs here Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications.
OpenAI: GPT-5.2	`openai/gpt-5.2`	Openai	Paid	4 K	0.00175	0.014	GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and lo…
Full description GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. Built for broad task coverage, GPT-5.2 delivers consistent gains across math, coding, sciende, and tool calling workloads, with more coherent long-form answers and improved tool-use reliability.
OpenAI: GPT-5.2 Chat	`openai/gpt-5.2-chat`	Openai	Paid	128 K	0.00175	0.014	GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-…
Full description GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.2 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation.
OpenAI: GPT-5.2 Pro	`openai/gpt-5.2-pro`	Openai	Paid	4 K	0.021	0.168	GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and l…
Full description GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance over GPT-5 Pro. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like “think hard about this.” Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.
OpenAI: GPT-5.2-Codex	`openai/gpt-5.2-codex`	Openai	Paid	4 K	0.00175	0.014	GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and cod…
Full description GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1-Codex, 5.2-Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the docs here Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications.
OpenAI: GPT-5.3 Chat	`openai/gpt-5.3-chat`	Openai	Paid	128 K	0.00175	0.014	GPT-5.3 Chat is an update to ChatGPT’s most-used model that makes everyday conversations smoothe…
Full description GPT-5.3 Chat is an update to ChatGPT’s most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly reduces unnecessary refusals, caveats, and overly cautious phrasing that can interrupt conversational flow.
OpenAI: GPT-5.3-Codex	`openai/gpt-5.3-codex`	Openai	Paid	4 K	0.00175	0.014	GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software en…
Full description GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results on SWE-Bench Pro and strong performance on Terminal-Bench 2.0 and OSWorld-Verified, reflecting improved multi-language coding, terminal proficiency, and real-world computer-use skills. The model is optimized for long-running, tool-using workflows and supports interactive steering during execution, making it suitable for complex development tasks, debugging, deployment, and iterative product work. Beyond coding, GPT-5.3-Codex performs strongly on structured knowledge-work benchmarks such as GDPval, supporting tasks like document drafting, spreadsheet analysis, slide creation, and operational research across domains. It is trained with enhanced cybersecurity awareness, including vulnerability identification capabilities, and deployed with additional safeguards for high-risk use cases. Compared to prior Codex models, it is more token-efficient and approximately 25% faster, targeting professional end-to-end workflows that span reasoning, execution, and computer interaction.
OpenAI: GPT-5.4	`openai/gpt-5.4`	Openai	Paid	1.05 M	0.0025	0.015	GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system…
Full description GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs, enabling high-context reasoning, coding, and multimodal analysis within the same workflow. The model delivers improved performance in coding, document understanding, tool use, and instruction following. It is designed as a strong default for both general-purpose tasks and software engineering, capable of generating production-quality code, synthesizing information across multiple sources, and executing complex multi-step workflows with fewer iterations and greater token efficiency.
OpenAI: GPT-5.4 Mini	`openai/gpt-5.4-mini`	Openai	Paid	4 K	0.00075	0.0045	GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized…
Full description GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It supports text and image inputs with strong performance across reasoning, coding, and tool use, while reducing latency and cost for large-scale deployments. The model is designed for production environments that require a balance of capability and efficiency, making it well suited for chat applications, coding assistants, and agent workflows that operate at scale. GPT-5.4 mini delivers reliable instruction following, solid multi-step reasoning, and consistent performance across diverse tasks with improved cost efficiency.
OpenAI: GPT-5.4 Nano	`openai/gpt-5.4-nano`	Openai	Paid	4 K	0.0002	0.00125	GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized…
Full description GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks. It supports text and image inputs and is designed for low-latency use cases such as classification, data extraction, ranking, and sub-agent execution. The model prioritizes responsiveness and efficiency over deep reasoning, making it ideal for pipelines that require fast, reliable outputs at scale. GPT-5.4 nano is well suited for background tasks, real-time systems, and distributed agent architectures where minimizing cost and latency is essential.
OpenAI: GPT-5.4 Pro	`openai/gpt-5.4-pro`	Openai	Paid	1.05 M	0.03	0.18	GPT-5.4 Pro is OpenAI’s most advanced model, building on GPT-5.4’s unified architecture with enh…
Full description GPT-5.4 Pro is OpenAI’s most advanced model, building on GPT-5.4’s unified architecture with enhanced reasoning capabilities for complex, high-stakes tasks. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs. Optimized for step-by-step reasoning, instruction following, and accuracy, GPT-5.4 Pro excels at agentic coding, long-context workflows, and multi-step problem solving.
OpenAI: gpt-oss-safeguard-20b	`openai/gpt-oss-safeguard-20b`	Openai	Paid	131 K	0.00009	0.00039	gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-…
Full description gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-weight, 21B-parameter Mixture-of-Experts (MoE) model offers lower latency for safety tasks like content classification, LLM filtering, and trust & safety labeling. Learn more about this model in OpenAI’s gpt-oss-safeguard user guide.
OpenAI: o3 Pro	`openai/o3-pro`	Openai	Paid	2 K	0.02	0.08	The o-series of models are trained with reinforcement learning to think before they answer and p…
Full description The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently…
Perplexity: Sonar	`perplexity/sonar`	Perplexity	Paid	127 K	0.001	0.001	Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the abil…
Full description Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the ability to customize sources. It is designed for companies seeking to integrate lightweight question-and-answer features…
Perplexity: Sonar Pro	`perplexity/sonar-pro`	Perplexity	Paid	2 K	0.003	0.015	Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perp…
Full description Note: Sonar Pro pricing includes Perplexity search pricing. See details here For enterprises seeking more advanced capabilities, the Sonar Pro API can handle in-depth, multi-step queries with added extensibility, like…
Perplexity: Sonar Pro Search	`perplexity/sonar-pro-search`	Perplexity	Paid	2 K	0.003	0.015	Exclusively available on the OpenRouter API, Sonar Pro’s new Pro Search mode is Perplexity’s mos…
Full description Exclusively available on the OpenRouter API, Sonar Pro’s new Pro Search mode is Perplexity’s most advanced agentic search system. It is designed for deeper reasoning and analysis. Pricing is based on tokens plus $18 per thousand requests. This model powers the Pro Search mode on the Perplexity platform. Sonar Pro Search adds autonomous, multi-step reasoning to Sonar Pro. So, instead of just one query + synthesis, it plans and executes entire research workflows using tools.
Qwen Flash	`qwen/qwen-flash`	Qwen	Paid	1 M	0.000022	0.000216	No description available.
Full description No description available.
Qwen Flash 2025 07 28	`qwen/qwen-flash-2025-07-28`	Qwen	Paid	131 K	0.000022	0.000216	No description available.
Full description No description available.
Qwen Mt Flash	`qwen/qwen-mt-flash`	Qwen	Paid	131 K	0.000101	0.00028	No description available.
Full description No description available.
Qwen Mt Lite	`qwen/qwen-mt-lite`	Qwen	Paid	131 K	0.000086	0.000229	No description available.
Full description No description available.
Qwen Mt Plus	`qwen/qwen-mt-plus`	Qwen	Paid	131 K	0.000259	0.000775	No description available.
Full description No description available.
Qwen Plus	`qwen/qwen-plus`	Qwen	Paid	1 M	0.000115	0.000287	No description available.
Full description No description available.
Qwen Plus 2025 07 28:Non Thinking	`qwen/qwen-plus-2025-07-28:non-thinking`	Qwen	Paid	131 K	0.000115	0.000287	No description available.
Full description No description available.
Qwen Plus 2025 09 11	`qwen/qwen-plus-2025-09-11`	Qwen	Paid	131 K	0.000345	0.002868	No description available.
Full description No description available.
Qwen Plus 2025 09 11:Non Thinking	`qwen/qwen-plus-2025-09-11:non-thinking`	Qwen	Paid	131 K	0.000115	0.000287	No description available.
Full description No description available.
Qwen Plus 2025 09 11:Thinking	`qwen/qwen-plus-2025-09-11:thinking`	Qwen	Paid	131 K	0.000115	0.001147	No description available.
Full description No description available.
Qwen Plus 2025 12 01	`qwen/qwen-plus-2025-12-01`	Qwen	Paid	131 K	0.000115	0.000287	No description available.
Full description No description available.
Qwen Plus 2025 12 01:Non Thinking	`qwen/qwen-plus-2025-12-01:non-thinking`	Qwen	Paid	131 K	0.000345	0.002868	No description available.
Full description No description available.
Qwen Plus 2025 12 01:Thinking	`qwen/qwen-plus-2025-12-01:thinking`	Qwen	Paid	131 K	0.000115	0.001147	No description available.
Full description No description available.
Qwen Plus:Non Thinking	`qwen/qwen-plus:non-thinking`	Qwen	Paid	131 K	0.000689	0.006881	No description available.
Full description No description available.
Qwen Plus:Thinking	`qwen/qwen-plus:thinking`	Qwen	Paid	131 K	0.000115	0.001147	No description available.
Full description No description available.
qwen/text-embedding-v3	`qwen/text-embedding-v3`	Qwen	Paid	N/A	0.00007	0	No description available.
Full description No description available.
qwen/text-embedding-v4	`qwen/text-embedding-v4`	Qwen	Paid	N/A	0.00007	0	No description available.
Full description No description available.
Qwen2.5 72B Instruct	`qwen/qwen-2.5-72b-instruct`	Qwen	Paid	32.8 K	0.00012	0.00039	Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following imp…
Full description Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and…
Qwen2.5 Coder 32B Instruct	`qwen/qwen-2.5-coder-32b-instruct`	Qwen	Paid	32.8 K	0.00066	0.001	Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known a…
Full description Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in code generation, code reasoning…
Qwen3 14B:Non Thinking	`qwen/qwen3-14b:non-thinking`	Qwen	Paid	131 K	0.000144	0.000574	No description available.
Full description No description available.
Qwen3 14B:Thinking	`qwen/qwen3-14b:thinking`	Qwen	Paid	131 K	0.000144	0.001434	No description available.
Full description No description available.
Qwen3 235B A22B Instruct 2507	`qwen/qwen3-235b-a22b-instruct-2507`	Qwen	Paid	131 K	0.00023	0.00092	No description available.
Full description No description available.
Qwen3 235B A22B:Non Thinking	`qwen/qwen3-235b-a22b:non-thinking`	Qwen	Paid	131 K	0.000287	0.001147	No description available.
Full description No description available.
Qwen3 235B A22B:Thinking	`qwen/qwen3-235b-a22b:thinking`	Qwen	Paid	131 K	0.000287	0.002868	No description available.
Full description No description available.
Qwen3 30B A3B:Non Thinking	`qwen/qwen3-30b-a3b:non-thinking`	Qwen	Paid	131 K	0.000108	0.000431	No description available.
Full description No description available.
Qwen3 30B A3B:Thinking	`qwen/qwen3-30b-a3b:thinking`	Qwen	Paid	131 K	0.000108	0.001076	No description available.
Full description No description available.
Qwen3 32B V1	`qwen/qwen3-32b-v1`	Qwen	Paid	131 K	0.0002	0.0006	No description available.
Full description No description available.
Qwen3 32B:Non Thinking	`qwen/qwen3-32b:non-thinking`	Qwen	Paid	131 K	0.00016	0.00064	No description available.
Full description No description available.
Qwen3 32B:Thinking	`qwen/qwen3-32b:thinking`	Qwen	Paid	131 K	0.00016	0.00064	No description available.
Full description No description available.
Qwen3 5 Flash	`qwen/qwen3.5-flash`	Qwen	Paid	1 M	0.000029	0.000287	No description available.
Full description No description available.
Qwen3 5 Flash 2026 02 23	`qwen/qwen3.5-flash-2026-02-23`	Qwen	Paid	131 K	0.000029	0.000287	No description available.
Full description No description available.
Qwen3 5 Plus	`qwen/qwen3.5-plus`	Qwen	Paid	1 M	0.000115	0.000688	No description available.
Full description No description available.
Qwen3 5 Plus 2026 02 15	`qwen/qwen3.5-plus-2026-02-15`	Qwen	Paid	131 K	0.000115	0.000688	No description available.
Full description No description available.
Qwen3 6 Plus	`qwen/qwen3.6-plus`	Qwen	Paid	131 K	0.000276	0.001651	No description available.
Full description No description available.
Qwen3 6 Plus 2026 04 02	`qwen/qwen3.6-plus-2026-04-02`	Qwen	Paid	131 K	0.000276	0.001651	No description available.
Full description No description available.
Qwen3 8B:Non Thinking	`qwen/qwen3-8b:non-thinking`	Qwen	Paid	131 K	0.000072	0.000287	No description available.
Full description No description available.
Qwen3 8B:Thinking	`qwen/qwen3-8b:thinking`	Qwen	Paid	131 K	0.000072	0.000717	No description available.
Full description No description available.
Qwen3 Coder 30B A3B V1	`qwen/qwen3-coder-30b-a3b-v1`	Qwen	Paid	131 K	0.00015	0.00062	No description available.
Full description No description available.
Qwen3 Coder 480B A35B Instruct	`qwen/qwen3-coder-480b-a35b-instruct`	Qwen	Paid	262 K	0.000861	0.003441	No description available.
Full description No description available.
Qwen3 Coder Flash 2025 07 28	`qwen/qwen3-coder-flash-2025-07-28`	Qwen	Paid	131 K	0.000144	0.000574	No description available.
Full description No description available.
Qwen3 Coder Plus 2025 07 22	`qwen/qwen3-coder-plus-2025-07-22`	Qwen	Paid	131 K	0.000574	0.002294	No description available.
Full description No description available.
Qwen3 Coder Plus 2025 09 23	`qwen/qwen3-coder-plus-2025-09-23`	Qwen	Paid	131 K	0.000574	0.002294	No description available.
Full description No description available.
Qwen3 Max 2025 09 23	`qwen/qwen3-max-2025-09-23`	Qwen	Paid	131 K	0.000861	0.003441	No description available.
Full description No description available.
Qwen3 Max 2026 01 23	`qwen/qwen3-max-2026-01-23`	Qwen	Paid	262 K	0.000359	0.001434	No description available.
Full description No description available.
Qwen3 Max Preview	`qwen/qwen3-max-preview`	Qwen	Paid	131 K	0.000861	0.003441	No description available.
Full description No description available.
Qwen3 Next 80B A3B	`qwen/qwen3-next-80b-a3b`	Qwen	Paid	131 K	0.00015	0.0012	No description available.
Full description No description available.
Qwen3 Vl 235B A22B Thinking:Thinking	`qwen/qwen3-vl-235b-a22b-thinking:thinking`	Qwen	Paid	131 K	0.000287	0.002868	No description available.
Full description No description available.
Qwen3 Vl 30B A3B Thinking:Thinking	`qwen/qwen3-vl-30b-a3b-thinking:thinking`	Qwen	Paid	131 K	0.000108	0.001076	No description available.
Full description No description available.
Qwen3 Vl 32B Thinking:Thinking	`qwen/qwen3-vl-32b-thinking:thinking`	Qwen	Paid	131 K	0.00016	0.00064	No description available.
Full description No description available.
Qwen3 Vl 8B Thinking:Thinking	`qwen/qwen3-vl-8b-thinking:thinking`	Qwen	Paid	131 K	0.000072	0.000717	No description available.
Full description No description available.
Qwen3 Vl Flash	`qwen/qwen3-vl-flash`	Qwen	Paid	131 K	0.000022	0.000215	No description available.
Full description No description available.
Qwen3 Vl Flash 2025 10 15	`qwen/qwen3-vl-flash-2025-10-15`	Qwen	Paid	131 K	0.000022	0.000215	No description available.
Full description No description available.
Qwen3 Vl Plus	`qwen/qwen3-vl-plus`	Qwen	Paid	131 K	0.000144	0.001434	No description available.
Full description No description available.
Qwen3 Vl Plus 2025 09 23	`qwen/qwen3-vl-plus-2025-09-23`	Qwen	Paid	131 K	0.000144	0.001434	No description available.
Full description No description available.
Qwen: Qwen Plus 0728	`qwen/qwen-plus-2025-07-28`	Qwen	Paid	1 M	0.000345	0.002868	Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning mod…
Full description Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.
Qwen: Qwen Plus 0728 (thinking)	`qwen/qwen-plus-2025-07-28:thinking`	Qwen	Paid	1 M	0.000115	0.001147	Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning mod…
Full description Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.
Qwen: Qwen2.5 7B Instruct	`qwen/qwen-2.5-7b-instruct`	Qwen	Paid	32.8 K	0.00004	0.0001	Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following impr…
Full description Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and…
Qwen: Qwen3 235B A22B	`qwen/qwen3-235b-a22b`	Qwen	Paid	131 K	0.000455	0.00182	Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating…
Full description Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a “thinking” mode for complex reasoning, math, and…
Qwen: Qwen3 235B A22B Instruct 2507	`qwen/qwen3-235b-a22b-2507`	Qwen	Paid	262 K	0.000071	0.0001	Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language m…
Full description Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following,…
Qwen: Qwen3 235B A22B Thinking 2507	`qwen/qwen3-235b-a22b-thinking-2507`	Qwen	Paid	131 K	0.0001495	0.001495	Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) langua…
Full description Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144…
Qwen: Qwen3 30B A3B	`qwen/qwen3-30b-a3b`	Qwen	Paid	41 K	0.00008	0.00028	Qwen3, the latest generation in the Qwen large language model series, features both dense and mi…
Full description Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique…
Qwen: Qwen3 30B A3B Instruct 2507	`qwen/qwen3-30b-a3b-instruct-2507`	Qwen	Paid	262 K	0.000108	0.000431	Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, wi…
Full description Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. It operates in non-thinking mode and is designed for high-quality instruction following, multilingual understanding, and…
Qwen: Qwen3 32B	`qwen/qwen3-32b`	Qwen	Paid	41 K	0.00008	0.00024	Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for…
Full description Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a “thinking” mode for…
Qwen: Qwen3 8B	`qwen/qwen3-8b`	Qwen	Paid	41 K	0.00005	0.0004	Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for bot…
Full description Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between “thinking” mode for math,…
Qwen: Qwen3 Coder 30B A3B Instruct	`qwen/qwen3-coder-30b-a3b-instruct`	Qwen	Paid	16 K	0.000216	0.000861	Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 expert…
Full description Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the…
Qwen: Qwen3 Coder 480B A35B	`qwen/qwen3-coder`	Qwen	Paid	262 K	0.00022	0.001	Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by…
Full description Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over…
Qwen: Qwen3 Coder Flash	`qwen/qwen3-coder-flash`	Qwen	Paid	1 M	0.000144	0.000574	Qwen3 Coder Flash is Alibaba’s fast and cost efficient version of their proprietary Qwen3 Coder…
Full description Qwen3 Coder Flash is Alibaba’s fast and cost efficient version of their proprietary Qwen3 Coder Plus. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities.
Qwen: Qwen3 Coder Next	`qwen/qwen3-coder-next`	Qwen	Paid	262 K	0.00012	0.00075	Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local d…
Full description Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse MoE design with 80B total parameters and only 3B activated per token, delivering performance comparable to models with 10 to 20x higher active compute, which makes it well suited for cost-sensitive, always-on agent deployment. The model is trained with a strong agentic focus and performs reliably on long-horizon coding tasks, complex tool usage, and recovery from execution failures. With a native 256k context window, it integrates cleanly into real-world CLI and IDE environments and adapts well to common agent scaffolds used by modern coding tools. The model operates exclusively in non-thinking mode and does not emit <think> blocks, simplifying integration for production coding agents.
Qwen: Qwen3 Coder Plus	`qwen/qwen3-coder-plus`	Qwen	Paid	1 M	0.000574	0.002294	Qwen3 Coder Plus is Alibaba’s proprietary version of the Open Source Qwen3 Coder 480B A35B. It i…
Full description Qwen3 Coder Plus is Alibaba’s proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities.
Qwen: Qwen3 Max	`qwen/qwen3-max`	Qwen	Paid	262 K	0.000359	0.001434	Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reason…
Full description Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It delivers higher accuracy in math, coding, logic, and science tasks, follows complex instructions in Chinese and English more reliably, reduces hallucinations, and produces higher-quality responses for open-ended Q&A, writing, and conversation. The model supports over 100 languages with stronger translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool calling, though it does not include a dedicated “thinking” mode.
Qwen: Qwen3 Max Thinking	`qwen/qwen3-max-thinking`	Qwen	Paid	262 K	0.00078	0.0039	Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes…
Full description Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it delivers major gains in factual accuracy, complex reasoning, instruction following, alignment with human preferences, and agentic behavior.
Qwen: Qwen3 Next 80B A3B Instruct	`qwen/qwen3-next-80b-a3b-instruct`	Qwen	Paid	262 K	0.000144	0.000574	Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimize…
Full description Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual use, while remaining robust on alignment and formatting. Compared with prior Qwen3 instruct variants, it focuses on higher throughput and stability on ultra-long inputs and multi-turn dialogues, making it well-suited for RAG, tool use, and agentic workflows that require consistent final answers rather than visible chain-of-thought. The model employs scaling-efficient training and decoding to improve parameter efficiency and inference speed, and has been validated on a broad set of public benchmarks where it reaches or approaches larger Qwen3 systems in several categories while outperforming earlier mid-sized baselines. It is best used as a general assistant, code helper, and long-context task solver in production settings where deterministic, instruction-following outputs are preferred.
Qwen: Qwen3 Next 80B A3B Thinking	`qwen/qwen3-next-80b-a3b-thinking`	Qwen	Paid	131 K	0.0000975	0.00078	Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs…
Full description Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic planning, and reports strong results across knowledge, reasoning, coding, alignment, and multilingual evaluations. Compared with prior Qwen3 variants, it emphasizes stability under long chains of thought and efficient scaling during inference, and it is tuned to follow complex instructions while reducing repetitive or off-task behavior. The model is suitable for agent frameworks and tool use (function calling), retrieval-heavy workflows, and standardized benchmarking where step-by-step solutions are required. It supports long, detailed completions and leverages throughput-oriented techniques (e.g., multi-token prediction) for faster generation. Note that it operates in thinking-only mode.
Qwen: Qwen3 VL 235B A22B Instruct	`qwen/qwen3-vl-235b-a22b-instruct`	Qwen	Paid	262 K	0.000287	0.001147	Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generati…
Full description Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table extraction, multilingual OCR). The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows—turning sketches or mockups into code and assisting with UI debugging—while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.
Qwen: Qwen3 VL 235B A22B Thinking	`qwen/qwen3-vl-235b-a22b-thinking`	Qwen	Paid	131 K	0.00026	0.0026	Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visua…
Full description Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math. The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows, turning sketches or mockups into code and assisting with UI debugging, while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.
Qwen: Qwen3 VL 30B A3B Instruct	`qwen/qwen3-vl-30b-a3b-instruct`	Qwen	Paid	131 K	0.000108	0.000431	Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual…
Full description Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.
Qwen: Qwen3 VL 30B A3B Thinking	`qwen/qwen3-vl-30b-a3b-thinking`	Qwen	Paid	131 K	0.00013	0.00156	Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual…
Full description Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.
Qwen: Qwen3 VL 32B Instruct	`qwen/qwen3-vl-32b-instruct`	Qwen	Paid	131 K	0.00016	0.00064	Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precis…
Full description Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text comprehension, enabling fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding.Robust OCR in 32 languages, and enhanced multimodal fusion through Interleaved-MRoPE and DeepStack architectures. Optimized for agentic interaction and visual tool use, Qwen3-VL-32B delivers state-of-the-art performance for complex real-world multimodal tasks.
Qwen: Qwen3 VL 8B Instruct	`qwen/qwen3-vl-8b-instruct`	Qwen	Paid	131 K	0.000072	0.000287	Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for h…
Full description Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon temporal reasoning, DeepStack for fine-grained visual-text alignment, and text-timestamp alignment for precise event localization. The model supports a native 256K-token context window, extensible to 1M tokens, and handles both static and dynamic media inputs for tasks like document parsing, visual question answering, spatial reasoning, and GUI control. It achieves text understanding comparable to leading LLMs while expanding OCR coverage to 32 languages and enhancing robustness under varied visual conditions.
Qwen: Qwen3 VL 8B Thinking	`qwen/qwen3-vl-8b-thinking`	Qwen	Paid	131 K	0.000117	0.001365	Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, des…
Full description Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and long-context processing (native 256K, expandable to 1M tokens) for tasks such as scientific visual analysis, causal inference, and mathematical reasoning over image or video inputs. Compared to the Instruct edition, the Thinking version introduces deeper visual-language fusion and deliberate reasoning pathways that improve performance on long-chain logic tasks, STEM problem-solving, and multi-step video understanding. It achieves stronger temporal grounding via Interleaved-MRoPE and timestamp-aware embeddings, while maintaining robust OCR, multilingual comprehension, and text generation on par with large text-only LLMs.
Qwen: Qwen3.5 397B A17B	`qwen/qwen3.5-397b-a17b`	Qwen	Paid	262 K	0.000172	0.001032	The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that…
Full description The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers…
Qwen: Qwen3.5 Plus 2026-02-15	`qwen/qwen3.5-plus-02-15`	Qwen	Paid	1 M	0.00026	0.00156	The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that in…
Full description The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of task evaluations, the 3.5 series consistently demonstrates performance on par with state-of-the-art leading models. Compared to the 3 series, these models show a leap forward in both pure-text and multimodal capabilities.
Qwen: Qwen3.5-122B-A10B	`qwen/qwen3.5-122b-a10b`	Qwen	Paid	262 K	0.000115	0.000917	The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integr…
Full description The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.
Qwen: Qwen3.5-27B	`qwen/qwen3.5-27b`	Qwen	Paid	262 K	0.000086	0.000688	The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, de…
Full description The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.
Qwen: Qwen3.5-35B-A3B	`qwen/qwen3.5-35b-a3b`	Qwen	Paid	262 K	0.000057	0.000459	The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture…
Full description The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall…
Qwen: Qwen3.5-Flash	`qwen/qwen3.5-flash-02-23`	Qwen	Paid	1 M	0.000065	0.00026	The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrat…
Full description The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.
Reka Edge	`rekaai/reka-edge`	Rekaai	Paid	16.4 K	0.0001	0.0001	Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video…
Full description Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding, video analysis, object detection, and agentic tool-use.
Relace: Relace Search	`relace/relace-search`	Relace	Paid	256 K	0.001	0.003	The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to explore a codebase…
Full description The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to explore a codebase and return relevant files to the user request. In contrast to RAG, relace-search performs agentic multi-step reasoning to produce highly precise results 4x faster than any frontier model. It’s designed to serve as a subagent that passes its findings to an “oracle” coding agent, who orchestrates/performs the rest of the coding task. To use relace-search you need to build an appropriate agent harness, and parse the response for relevant information to hand off to the oracle. Read more about it in the Relace documentation.
ReMM SLERP 13B	`undi95/remm-slerp-l2-13b`	Undi95	Paid	6.14 K	0.00045	0.00065	A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge
Full description A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge
Sao10K: Llama 3 8B Lunaris	`sao10k/l3-lunaris-8b`	Sao10K	Paid	8.19 K	0.00004	0.00005	Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It’s a strategic me…
Full description Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It’s a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge…
Sao10k: Llama 3 Euryale 70B v2.1	`sao10k/l3-euryale-70b`	Sao10K	Paid	8.19 K	0.00148	0.00148	Euryale 70B v2.1 is a model focused on creative roleplay from Sao10k…
Full description Euryale 70B v2.1 is a model focused on creative roleplay from Sao10k. - Better prompt adherence. - Better anatomy / spatial awareness. - Adapts much better to unique and custom…
Sao10K: Llama 3.1 70B Hanami x1	`sao10k/l3.1-70b-hanami-x1`	Sao10K	Paid	16 K	0.003	0.003	This is Sao10K’s experiment over Euryale v2.2.
Full description This is Sao10K’s experiment over Euryale v2.2.
Sao10K: Llama 3.1 Euryale 70B v2.2	`sao10k/l3.1-euryale-70b`	Sao10K	Paid	131 K	0.00085	0.00085	Euryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sa…
Full description Euryale L3.1 70B v2.2 is a model focused on creative roleplay from Sao10k. It is the successor of Euryale L3 70B v2.1.
StepFun: Step 3.5 Flash	`stepfun/step-3.5-flash`	Stepfun	Paid	262 K	0.0001	0.0003	Step 3.5 Flash is StepFun’s most capable open-source foundation model. Built on a sparse Mixture…
Full description Step 3.5 Flash is StepFun’s most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. It is a reasoning model that is incredibly speed efficient even at long contexts.
Tencent: Hunyuan A13B Instruct	`tencent/hunyuan-a13b-instruct`	Tencent	Paid	131 K	0.00014	0.00057	Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tenc…
Full description Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought. It offers competitive benchmark…
TheDrummer: Cydonia 24B V4.1	`thedrummer/cydonia-24b-v4.1`	Thedrummer	Paid	131 K	0.0003	0.0005	Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt ad…
Full description Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt adherence, and intelligence.
TheDrummer: Rocinante 12B	`thedrummer/rocinante-12b`	Thedrummer	Paid	32.8 K	0.00017	0.00043	Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported:…
Full description Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and expressive word choices - Enhanced creativity for vivid narratives -…
TheDrummer: Skyfall 36B V2	`thedrummer/skyfall-36b-v2`	Thedrummer	Paid	32.8 K	0.00055	0.0008	Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for impro…
Full description Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for improved creativity, nuanced writing, role-playing, and coherent storytelling.
TheDrummer: UnslopNemo 12B	`thedrummer/unslopnemo-12b`	Thedrummer	Paid	32.8 K	0.0004	0.0004	UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure wri…
Full description UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.
vertex/gemini-embedding-001	`vertex/gemini-embedding-001`	Vertex	Paid	N/A	0.00015	0	No description available.
Full description No description available.
vertex/text-embedding-005	`vertex/text-embedding-005`	Vertex	Paid	N/A	0.000025	0	No description available.
Full description No description available.
vertex/text-multilingual-embedding-002	`vertex/text-multilingual-embedding-002`	Vertex	Paid	N/A	0.000025	0	No description available.
Full description No description available.
WizardLM-2 8x22B	`microsoft/wizardlm-2-8x22b`	Microsoft	Paid	65.5 K	0.00062	0.00062	WizardLM-2 8x22B is Microsoft AI’s most advanced Wizard model. It demonstrates highly competitiv…
Full description WizardLM-2 8x22B is Microsoft AI’s most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is…
Writer: Palmyra X4	`writer/palmyra-x4-v1`	Writer	Paid	131 K	0.005	0.015	No description available.
Full description No description available.
Writer: Palmyra X5	`writer/palmyra-x5`	Writer	Paid	1.04 M	0.0006	0.006	Palmyra X5 is Writer’s most advanced model, purpose-built for building and scaling AI agents acr…
Full description Palmyra X5 is Writer’s most advanced model, purpose-built for building and scaling AI agents across the enterprise. It delivers industry-leading speed and efficiency on context windows up to 1 million tokens, powered by a novel transformer architecture and hybrid attention mechanisms. This enables faster inference and expanded memory for processing large volumes of enterprise data, critical for scaling AI agents.
Writer: Palmyra X5	`writer/palmyra-x5-v1`	Writer	Paid	1 M	0.006	0.03	No description available.
Full description No description available.
xAI: Grok 4.20	`x-ai/grok-4.20`	X Ai	Paid	2 M	0.002	0.006	Grok 4.20 is xAI’s newest flagship model with industry-leading speed and agentic tool calling ca…
Full description Grok 4.20 is xAI’s newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently precise and truthful responses. Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. Learn more in our docs
xAI: Grok 4.20 Multi-Agent	`x-ai/grok-4.20-multi-agent`	X Ai	Paid	2 M	0.002	0.006	Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based wo…
Full description Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information…
Xiaomi: MiMo-V2-Flash	`xiaomi/mimo-v2-flash`	Xiaomi	Paid	262 K	0.00009	0.00029	MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-o…
Full description MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters, adopting hybrid attention architecture. MiMo-V2-Flash supports a hybrid-thinking toggle and a 256K context window, and excels at reasoning, coding, and agent scenarios. On SWE-bench Verified and SWE-bench Multilingual, MiMo-V2-Flash ranks as the top #1 open-source model globally, delivering performance comparable to Claude Sonnet 4.5 while costing only about 3.5% as much. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. Learn more in our docs.
Z.ai: GLM 4 32B	`z-ai/glm-4-32b`	Z Ai	Paid	128 K	0.0001	0.0001	GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex task…
Full description GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It…
Z.ai: GLM 4.5 Air	`z-ai/glm-4.5-air`	Z Ai	Paid	131 K	0.00013	0.00085	GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built f…
Full description GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter…

Page 1 of 1