Available Models

View as Markdown

Available Models

This page is auto-generated from GET /v1/models so the catalog stays aligned with the live Mesh API inventory.

<>

Total models: 354

Showing 1-25 of 354 models

<div className=“models-table-wrap”> <table className=“models-table”>

NameModel IDProviderTierContextInput (USD)Output (USD)Description

<tbody data-model-search-grid>

AI21: Jamba 1.5 Large
ai21/jamba-1-5-large-v1Ai21Paid262 K0.0020.008

No description available.

Full description

No description available.

AI21: Jamba 1.5 Mini
ai21/jamba-1-5-mini-v1Ai21Paid262 K0.00020.0004

No description available.

Full description

No description available.

AI21: Jamba Large 1.7
ai21/jamba-large-1.7Ai21Paid256 K0.0020.008

Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding…

Full description

Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall efficiency. Built on a hybrid SSM-Transformer architecture with a 256K context window, it delivers more accurate, contextually grounded responses and better steerability than previous versions.

AionLabs: Aion-1.0
aion-labs/aion-1.0Aion LabsPaid131 K0.0040.008

Aion-1.0 is a multi-model system designed for high performance across various tasks, including r…

Full description

Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree…

AionLabs: Aion-1.0-Mini
aion-labs/aion-1.0-miniAion LabsPaid131 K0.00070.0014

Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for…

Full description

Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, coding, and logic. It is a modified variant…

AionLabs: Aion-2.0
aion-labs/aion-2.0Aion LabsPaid131 K0.00080.0016

Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It…

Full description

Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing tension, crises, and conflict into stories, making narratives feel more engaging. It also handles mature and darker themes with more nuance and depth.

AionLabs: Aion-RP 1.0 (8B)
aion-labs/aion-rp-llama-3.1-8bAion LabsPaid32.8 K0.00080.0016

Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto b…

Full description

Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto, where LLMs evaluate each other’s responses. It is a fine-tuned base model…

Amazon: Nova 2 Lite
amazon/nova-2-lite-v1AmazonPaid1 M0.000060.00024

Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process te…

Full description

Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. Nova 2 Lite demonstrates standout capabilities in processing documents, extracting information from videos, generating code, providing accurate grounded answers, and automating multi-step agentic workflows.

Amazon: Nova Lite
amazon/nova-lite-v1AmazonPaid3 K0.000060.00024

No description available.

Full description

No description available.

Amazon: Nova Micro
amazon/nova-micro-v1AmazonPaid128 K0.000040.00014

No description available.

Full description

No description available.

Amazon: Nova Premier 1.0
amazon/nova-premier-v1AmazonPaid1 M0.00250.0125

Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning task…

Full description

Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models.

Amazon: Nova Pro
amazon/nova-pro-v1AmazonPaid3 K0.00080.0032

No description available.

Full description

No description available.

Anthropic: Claude 3.5 Haiku
anthropic/claude-3.5-haikuAnthropicPaid2 K0.00080.004

Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use…

Full description

Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick response times that are essential for dynamic…

Anthropic: Claude Fable 5
anthropic/claude-fable-5AnthropicPaid1 M0.010.05

Claude Fable 5 is a Mythos-class model from Anthropic, built for autonomous knowledge work and c…

Full description

Claude Fable 5 is a Mythos-class model from Anthropic, built for autonomous knowledge work and coding. It supports text, image, and file inputs with text output, with reasoning support and a 1M-token context window. It is suited for long-running, complex, and asynchronous tasks that previously required frequent human check-ins. It is particularly strong at end-to-end work that would otherwise take a person hours, days, or weeks - taking on problems that are long-running, ambiguous, or highly multi-step. It executes well-scoped tasks with few mistakes, automatically self-correcting through verification loops, and ships with robust safeguards.

Anthropic: Claude Haiku 4.5
anthropic/claude-haiku-4.5AnthropicPaid2 K0.00080.004

Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intel…

Full description

Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications. It introduces extended thinking to the Haiku line; enabling controllable reasoning depth, summarized or interleaved thought output, and tool-assisted workflows with full support for coding, bash, web search, and computer-use tools. Scoring >73% on SWE-bench Verified, Haiku 4.5 ranks among the world’s best coding models while maintaining exceptional responsiveness for sub-agents, parallelized execution, and scaled deployment.

Anthropic: Claude Opus 4
anthropic/claude-opus-4AnthropicPaid2 K0.0150.075

Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sust…

Full description

Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in…

Anthropic: Claude Opus 4.1
anthropic/claude-opus-4.1AnthropicPaid2 K0.0150.075

Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performan…

Full description

Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on SWE-bench Verified and shows notable gains…

Anthropic: Claude Opus 4.5
anthropic/claude-opus-4.5AnthropicPaid2 K0.0150.075

Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineeri…

Full description

Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon computer use. It offers strong multimodal capabilities, competitive performance across real-world coding and reasoning benchmarks, and improved robustness to prompt injection. The model is designed to operate efficiently across varied effort levels, enabling developers to trade off speed, depth, and token usage depending on task requirements. It comes with a new parameter to control token efficiency, which can be accessed using the OpenRouter Verbosity parameter with low, medium, or high. Opus 4.5 supports advanced tool use, extended context management, and coordinated multi-agent setups, making it well-suited for autonomous research, debugging, multi-step planning, and spreadsheet/browser manipulation. It delivers substantial gains in structured reasoning, execution reliability, and alignment compared to prior Opus generations, while reducing token overhead and improving performance on long-running tasks.

Anthropic: Claude Opus 4.6
anthropic/claude-opus-4.6AnthropicPaid1 M0.0150.075

Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is bu…

Full description

Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective for large codebases, complex refactors, and multi-step debugging that unfolds over time. The model shows deeper contextual understanding, stronger problem decomposition, and greater reliability on hard engineering tasks than prior generations. Beyond coding, Opus 4.6 excels at sustained knowledge work. It produces near-production-ready documents, plans, and analyses in a single pass, and maintains coherence across very long outputs and extended sessions. This makes it a strong default for tasks that require persistence, judgment, and follow-through, such as technical design, migration planning, and end-to-end project execution. For users upgrading from earlier Opus versions, see our official migration guide here

Anthropic: Claude Opus 4.7
anthropic/claude-opus-4.7AnthropicPaid1 M0.0050.025

Anthropic’s latest flagship, now live on Mesh API. Opus 4.7 is a big step up on hard coding work…

Full description

Anthropic’s latest flagship, now live on Mesh API. Opus 4.7 is a big step up on hard coding work, handling long-running agentic tasks with way more rigor and verifying its own outputs before handing them back. Better vision too, it reads high-res images at 3x the fidelity of older Claude models, so computer-use agents and diagram parsing actually work. Improved instruction following, stronger memory across sessions, and a new xhigh effort level for the really gnarly problems.

Anthropic: Claude Opus 4.8
anthropic/claude-opus-4.8AnthropicPaid1 M0.0050.025

Anthropic’s latest flagship, now live on Mesh API. Opus 4.7 is a big step up on hard coding work…

Full description

Anthropic’s latest flagship, now live on Mesh API. Opus 4.7 is a big step up on hard coding work, handling long-running agentic tasks with way more rigor and verifying its own outputs before handing them back. Better vision too, it reads high-res images at 3x the fidelity of older Claude models, so computer-use agents and diagram parsing actually work. Improved instruction following, stronger memory across sessions, and a new xhigh effort level for the really gnarly problems.

Anthropic: Claude Sonnet 4.5
anthropic/claude-sonnet-4.5AnthropicPaid1 M0.0030.015

Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world ag…

Full description

Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with improvements across system design, code security, and specification adherence. The model is designed for extended autonomous operation, maintaining task continuity across sessions and providing fact-based progress tracking. Sonnet 4.5 also introduces stronger agentic capabilities, including improved tool orchestration, speculative parallel execution, and more efficient context and memory management. With enhanced context tracking and awareness of token usage across tool calls, it is particularly well-suited for multi-context and long-running workflows. Use cases span software engineering, cybersecurity, financial analysis, research agents, and other domains requiring sustained reasoning and tool use.

Anthropic: Claude Sonnet 4.6
anthropic/claude-sonnet-4.6AnthropicPaid1 M0.0030.015

Sonnet 4.6 is Anthropic’s most capable Sonnet-class model yet, with frontier performance across…

Full description

Sonnet 4.6 is Anthropic’s most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with memory, polished document creation, and confident computer use for web QA and workflow automation.

Baidu: ERNIE 4.5 300B A47B
baidu/ernie-4.5-300b-a47bBaiduPaid123 K0.000280.0011

ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Bai…

Full description

ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4.5 series. It activates 47B parameters per token and supports text generation in…

Baidu: ERNIE 4.5 VL 424B A47B
baidu/ernie-4.5-vl-424b-a47bBaiduPaid123 K0.000420.00125

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 ser…

Full description

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data…

bedrock/amazon.titan-embed-g1-text-02
bedrock/amazon.titan-embed-g1-text-02BedrockPaidN/A0.000020

No description available.

Full description

No description available.

bedrock/amazon.titan-embed-text-v2:0
bedrock/amazon.titan-embed-text-v2:0BedrockPaidN/A0.000020

No description available.

Full description

No description available.

bedrock/cohere.embed-english-v3
bedrock/cohere.embed-english-v3BedrockPaidN/A0.00010

No description available.

Full description

No description available.

bedrock/cohere.embed-multilingual-v3
bedrock/cohere.embed-multilingual-v3BedrockPaidN/A0.00010

No description available.

Full description

No description available.

bedrock/cohere.embed-v4:0
bedrock/cohere.embed-v4:0BedrockPaidN/A0.000120

No description available.

Full description

No description available.

ByteDance Seed: Seed 1.6
bytedance-seed/seed-1.6Bytedance SeedPaid262 K0.000250.002

Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimo…

Full description

Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context window.

ByteDance Seed: Seed 1.6 Flash
bytedance-seed/seed-1.6-flashBytedance SeedPaid262 K0.0000750.0003

Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting bot…

Full description

Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context window and can generate outputs of up to 16k tokens.

ByteDance Seed: Seed-2.0-Lite
bytedance-seed/seed-2.0-liteBytedance SeedPaid262 K0.000250.002

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimoda…

Full description

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across text, vision, and tools. Engineered for high-frequency visual understanding and agentic workflows, it’s an ideal choice for deployment at scale with minimal latency.

ByteDance Seed: Seed-2.0-Mini
bytedance-seed/seed-2.0-miniBytedance SeedPaid262 K0.00010.0004

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasi…

Full description

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding, and is optimized for lightweight tasks where cost and speed take priority.

ByteDance-Seed-1.6
byteplus/seed-1-6ByteplusPaidN/A0.000250.002

ByteDance Seed 1.6.

Full description

ByteDance Seed 1.6.

ByteDance-Seed-1.6-flash
byteplus/seed-1-6-flashByteplusPaidN/A0.0000750.0003

ByteDance Seed 1.6 Flash — lightweight fast model.

Full description

ByteDance Seed 1.6 Flash — lightweight fast model.

ByteDance-Seed-1.8
byteplus/seed-1-8ByteplusPaidN/A0.000250.002

ByteDance Seed 1.8.

Full description

ByteDance Seed 1.8.

ByteDance-Seedance-1.0-pro
byteplus/seedance-1-0-proByteplusPaidN/A0.00250.0025

ByteDance Seedance 1.0 Pro — BytePlus video generation model.

Full description

ByteDance Seedance 1.0 Pro — BytePlus video generation model.

ByteDance-Seedance-1.0-pro-fast
byteplus/seedance-1-0-pro-fastByteplusPaidN/A0.0010.001

ByteDance Seedance 1.0 Pro Fast — BytePlus video generation model.

Full description

ByteDance Seedance 1.0 Pro Fast — BytePlus video generation model.

ByteDance-Seedance-1.5-pro
byteplus/seedance-1-5-proByteplusPaidN/A0.00120.0012

ByteDance Seedance 1.5 Pro — BytePlus video generation model.

Full description

ByteDance Seedance 1.5 Pro — BytePlus video generation model.

ByteDance-Seedream-4.0
byteplus/seedream-4-0ByteplusPaidN/A0.030.03

ByteDance Seedream 4.0 — BytePlus image generation model.

Full description

ByteDance Seedream 4.0 — BytePlus image generation model.

ByteDance-Seedream-4.5
byteplus/seedream-4-5ByteplusPaidN/A0.040.04

ByteDance Seedream 4.5 — BytePlus image generation model.

Full description

ByteDance Seedream 4.5 — BytePlus image generation model.

ByteDance: UI-TARS 7B
bytedance/ui-tars-1.5-7bBytedancePaid128 K0.00010.0002

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, includin…

Full description

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement…

Claude 3 Haiku
anthropic/claude-3-haikuAnthropicPaid2 K0.000250.00125

No description available.

Full description

No description available.

Claude Sonnet 4
anthropic/claude-sonnet-4AnthropicPaid2 K0.0030.015

No description available.

Full description

No description available.

Cohere: Command A
cohere/command-aCoherePaid256 K0.00250.01

Command A is an open-weights 111B parameter model with a 256k context window focused on deliveri…

Full description

Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases. Compared to other leading proprietary…

Cohere: Command R (08-2024)
cohere/command-r-08-2024CoherePaid128 K0.000150.0006

command-r-08-2024 is an update of the Command R with improved perfor…

Full description

command-r-08-2024 is an update of the Command R with improved performance for multilingual retrieval-augmented generation (RAG) and tool use. More broadly, it is better at math, code and reasoning and…

Cohere: Command R+ (08-2024)
cohere/command-r-plus-08-2024CoherePaid128 K0.00250.01

command-r-plus-08-2024 is an update of the Command R+ with roug…

Full description

command-r-plus-08-2024 is an update of the Command R+ with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint…

Cohere: Command R7B (12-2024)
cohere/command-r7b-12-2024CoherePaid128 K0.00003750.00015

Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 202…

Full description

Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning…

Deep Cogito: Cogito v2.1 671B
deepcogito/cogito-v2.1-671bDeepcogitoPaid128 K0.001250.00125

Cogito v2.1 671B MoE represents one of the strongest open models globally, matching performance…

Full description

Cogito v2.1 671B MoE represents one of the strongest open models globally, matching performance of frontier closed and open models. This model is trained using self play with reinforcement learning to reach state-of-the-art performance on multiple categories (instruction following, coding, longer queries and creative writing). This advanced system demonstrates significant progress toward scalable superintelligence through policy improvement.

DeepSeek-V3.2
deepseek/deepseek-v3-2DeepseekPaidN/A0.000280.00042

DeepSeek V3.2 — improved V3 with long-context support.

Full description

DeepSeek V3.2 — improved V3 with long-context support.

DeepSeek-V4-flash
deepseek/deepseek-v4-flashDeepseekPaid1.05 M0.000140.00028

DeepSeek V4 Flash — fast, cost-efficient LLM.

Full description

DeepSeek V4 Flash — fast, cost-efficient LLM.

DeepSeek-V4-pro
deepseek/deepseek-v4-proDeepseekPaid1.05 M0.001740.00348

DeepSeek V4 Pro — high-capability LLM.

Full description

DeepSeek V4 Pro — high-capability LLM.

DeepSeek: DeepSeek V3 0324
deepseek/deepseek-chat-v3-0324DeepseekPaid164 K0.00020.00077

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship…

Full description

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 model and performs really well…

DeepSeek: DeepSeek V3.1
deepseek/deepseek-chat-v3.1DeepseekPaid32.8 K0.000150.00075

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both…

Full description

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows. It succeeds the DeepSeek V3-0324 model and performs well on a variety of tasks.

DeepSeek: DeepSeek V3.1 Terminus
deepseek/deepseek-v3.1-terminusDeepseekPaid164 K0.000210.00079

DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that mainta…

Full description

DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model’s original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model’s performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

DeepSeek: DeepSeek V3.2
deepseek/deepseek-v3.2DeepseekPaid164 K0.000620.00185

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with…

Full description

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

DeepSeek: DeepSeek V3.2 Exp
deepseek/deepseek-v3.2-expDeepseekPaid164 K0.000270.00041

DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediat…

Full description

DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectures. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve training and inference efficiency in long-context scenarios while maintaining output quality. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs The model was trained under conditions aligned with V3.1-Terminus to enable direct comparison. Benchmarking shows performance roughly on par with V3.1 across reasoning, coding, and agentic tool-use tasks, with minor tradeoffs and gains depending on the domain. This release focuses on validating architectural optimizations for extended context lengths rather than advancing raw task accuracy, making it primarily a research-oriented model for exploring efficient transformer designs.

DeepSeek: R1
deepseek/deepseek-r1DeepseekPaid64 K0.001350.0054

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with…

Full description

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It’s 671B parameters in size, with 37B active in an inference pass…

Dola-Seed-2.0-Code
byteplus/seed-2-0-codeByteplusPaidN/A0.00050.003

ByteDance Seed 2.0 Code — code-focused LLM.

Full description

ByteDance Seed 2.0 Code — code-focused LLM.

Dola-Seed-2.0-lite
byteplus/seed-2-0-liteByteplusPaidN/A0.000250.002

ByteDance Seed 2.0 Lite — multimodal with audio support.

Full description

ByteDance Seed 2.0 Lite — multimodal with audio support.

Dola-Seed-2.0-mini
byteplus/seed-2-0-miniByteplusPaidN/A0.00010.0004

ByteDance Seed 2.0 Mini — compact multimodal with audio.

Full description

ByteDance Seed 2.0 Mini — compact multimodal with audio.

Dola-Seed-2.0-pro
byteplus/seed-2-0-proByteplusPaidN/A0.00050.003

ByteDance Seed 2.0 Pro.

Full description

ByteDance Seed 2.0 Pro.

Dola-Seedream-5.0
byteplus/seedream-5-0ByteplusPaidN/A0.0350.035

Dola Seedream 5.0 Lite — BytePlus multimodal image generation model.

Full description

Dola Seedream 5.0 Lite — BytePlus multimodal image generation model.

Dreamina-Seedance-2.0
byteplus/dreamina-seedance-2-0ByteplusPaidN/A0.070.0077

Dreamina Seedance 2.0 — BytePlus video generation model.

Full description

Dreamina Seedance 2.0 — BytePlus video generation model.

Dreamina-Seedance-2.0-fast
byteplus/dreamina-seedance-2-0-fastByteplusPaidN/A0.00560.0056

Dreamina Seedance 2.0 Fast — BytePlus video generation model.

Full description

Dreamina Seedance 2.0 Fast — BytePlus video generation model.

ElevenLabs Flash v2.5
elevenlabs/eleven_flash_v2_5ElevenlabsPaidN/A0.05N/A

Ultra-low latency (~75ms) TTS. 32 languages, 40K character limit.

Full description

Ultra-low latency (~75ms) TTS. 32 languages, 40K character limit.

ElevenLabs Multilingual v2
elevenlabs/eleven_multilingual_v2ElevenlabsPaidN/A0.1N/A

High-quality voice generation. 32 languages, 40K character limit.

Full description

High-quality voice generation. 32 languages, 40K character limit.

ElevenLabs Multilingual v3
elevenlabs/eleven_multilingual_v3ElevenlabsPaidN/A0.1N/A

Latest high-quality multilingual TTS. 32 languages, 40K character limit.

Full description

Latest high-quality multilingual TTS. 32 languages, 40K character limit.

ElevenLabs Scribe v1
elevenlabs/scribe_v1ElevenlabsPaidN/A0.22N/A

Speech-to-text with 98%+ accuracy. 90+ languages, keyterm prompting.

Full description

Speech-to-text with 98%+ accuracy. 90+ languages, keyterm prompting.

ElevenLabs Scribe v2
elevenlabs/scribe_v2ElevenlabsPaidN/A0.22N/A

Speech-to-text with 98%+ accuracy. 90+ languages, dynamic audio tagging.

Full description

Speech-to-text with 98%+ accuracy. 90+ languages, dynamic audio tagging.

ElevenLabs Scribe v2 Realtime
elevenlabs/scribe_v2_realtimeElevenlabsPaidN/A0.39N/A

Low-latency realtime transcription (~150ms). 90+ languages, word-level timestamps.

Full description

Low-latency realtime transcription (~150ms). 90+ languages, word-level timestamps.

ElevenLabs Turbo v2.5
elevenlabs/eleven_turbo_v2_5ElevenlabsPaidN/A0.05N/A

Low-latency TTS optimised for streaming. 32 languages, 40K character limit.

Full description

Low-latency TTS optimised for streaming. 32 languages, 40K character limit.

EssentialAI: Rnj 1 Instruct
essentialai/rnj-1-instructEssentialaiPaid32.8 K0.000150.00015

Rnj-1 is an 8B-parameter, dense, open-weight model family developed by Essential AI and trained…

Full description

Rnj-1 is an 8B-parameter, dense, open-weight model family developed by Essential AI and trained from scratch with a focus on programming, math, and scientific reasoning. The model demonstrates strong performance across multiple programming languages, tool-use workflows, and agentic execution environments (e.g., mini-SWE-agent).

Gemini 3 1 Flash Lite
google/gemini-3.1-flash-liteGooglePaidN/A0.000250.0015

No description available.

Full description

No description available.

Gemini 3 1 Flash Lite Preview
google/gemini-3.1-flash-lite-previewGooglePaid1.05 M0.000250.0015

Gemini 3.1 Flash Lite Preview is Google’s high-efficiency model optimized for high-volume use ca…

Full description

Gemini 3.1 Flash Lite Preview is Google’s high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across key capabilities. Improvements span audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. Supports full thinking levels (minimal, low, medium, high) for fine-grained cost/performance trade-offs. Priced at half the cost of Gemini 3 Flash.

Gemini 3 1 Pro Preview
google/gemini-3.1-pro-previewGooglePaid1.05 M0.0020.012

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engine…

Full description

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation of the Gemini 3 series, it combines high-precision reasoning across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning. The 3.1 update introduces measurable gains in SWE benchmarks and real-world coding environments, along with stronger autonomous task execution in structured domains such as finance and spreadsheet-based workflows. Designed for advanced development and agentic systems, Gemini 3.1 Pro Preview improves long-horizon stability and tool orchestration while increasing token efficiency. It introduces a new medium thinking level to better balance cost, speed, and performance. The model excels in agentic coding, structured planning, multimodal analysis, and workflow automation, making it well-suited for autonomous agents, financial modeling, spreadsheet automation, and high-context enterprise tasks.

Gemini 3 Flash Preview
google/gemini-3-flash-previewGooglePaid1.05 M0.00050.003

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows…

Full description

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants, making it well suited for interactive development, long running agent loops, and collaborative coding tasks. Compared to Gemini 2.5 Flash, it provides broad quality improvements across reasoning, multimodal understanding, and reliability. The model supports a 1M token context window and multimodal inputs including text, images, audio, video, and PDFs, with text output. It includes configurable reasoning via thinking levels (minimal, low, medium, high), structured output, tool use, and automatic context caching. Gemini 3 Flash Preview is optimized for users who want strong reasoning and agentic behavior without the cost or latency of full scale frontier models.

Glm 4 7
zai/glm-4-7ZaiPaid131 K0.00060.0022

No description available.

Full description

No description available.

Glm 4 7 Flash
zai/glm-4-7-flashZaiPaid131 K0.000070.0004

No description available.

Full description

No description available.

GLM-4.7
byteplus/glm-4-7ByteplusPaidN/A0.00060.0022

GLM-4.7 by Z.AI.

Full description

GLM-4.7 by Z.AI.

Google: Gemini 2.0 Flash Lite
google/gemini-2.0-flash-lite-001GooglePaid1.05 M0.0000750.0003

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemi…

Full description

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5,…

Google: Gemini 2.5 Flash Lite Preview 09-2025
google/gemini-2.5-flash-lite-preview-09-2025GooglePaid1.05 M0.00010.0004

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for u…

Full description

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, “thinking” (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the Reasoning API parameter to selectively trade off cost for intelligence.

Google: Gemini 2.5 Pro Preview 05-06
google/gemini-2.5-pro-preview-05-06GooglePaid1.05 M0.001250.01

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, ma…

Full description

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy…

Google: Gemini 2.5 Pro Preview 06-05
google/gemini-2.5-pro-previewGooglePaid1.05 M0.001250.01

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, ma…

Full description

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy…

Google: Gemma 2 27B
google/gemma-2-27b-itGooglePaid8.19 K0.000650.00065

Gemma 2 27B by Google is an open model built from the same research and technology used to creat…

Full description

Gemma 2 27B by Google is an open model built from the same research and technology used to create the Gemini models. Gemma models are well-suited for a variety of…

Google: Gemma 3 12B
google/gemma-3-12b-itGooglePaid131 K0.000090.00029

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles…

Full description

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,…

Google: Gemma 3 27B
google/gemma-3-27b-itGooglePaid131 K0.000080.00016

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles…

Full description

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,…

Google: Gemma 3 4B
google/gemma-3-4b-itGooglePaid131 K0.000040.00008

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles…

Full description

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,…

Google: Gemma 3n 4B
google/gemma-3n-e4b-itGooglePaid32.8 K0.000020.00004

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as…

Full description

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks…

Google: Gemma 4 26B A4B
google/gemma-4-26b-a4b-itGooglePaid262 K0.000130.0004

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind…

Full description

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at…

Google: Nano Banana (Gemini 2.5 Flash Image)
google/gemini-2.5-flash-imageGooglePaid32.8 K0.00030.03

Gemini 2.5 Flash Image, a.k.a. “Nano Banana,” is now generally available. It is a state of the a…

Full description

Gemini 2.5 Flash Image, a.k.a. “Nano Banana,” is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations. Aspect ratios can be controlled with the image_config API Parameter

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)
google/gemini-3.1-flash-image-previewGooglePaid65.5 K0.00050.049585

Gemini 3.1 Flash Image Preview, a.k.a. “Nano Banana 2,” is Google’s latest state of the art imag…

Full description

Gemini 3.1 Flash Image Preview, a.k.a. “Nano Banana 2,” is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines advanced contextual understanding with fast, cost-efficient inference, making complex image generation and iterative edits significantly more accessible. Aspect ratios can be controlled with the image_config API Parameter

Gpt 3 5 Turbo 0125
openai/gpt-3.5-turbo-0125OpenaiPaid16.4 K0.00050.0015

No description available.

Full description

No description available.

Gpt 3 5 Turbo 1106
openai/gpt-3.5-turbo-1106OpenaiPaid16.4 K0.0010.002

No description available.

Full description

No description available.

Gpt 4 0613
openai/gpt-4-0613OpenaiPaid8.19 K0.030.06

No description available.

Full description

No description available.

Gpt 4 Turbo 2024 04 09
openai/gpt-4-turbo-2024-04-09OpenaiPaid131 K0.010.03

No description available.

Full description

No description available.

Gpt 5 4 Image
openai/gpt-5.4-imageOpenaiPaidN/A0.0080.015

No description available.

Full description

No description available.

Gpt 5 4 Image 2
openai/gpt-5.4-image-2OpenaiPaidN/A0.0080.015

No description available.

Full description

No description available.

Gpt 5 4 Image Mini
openai/gpt-5.4-image-miniOpenaiPaidN/A0.0080.015

No description available.

Full description

No description available.

Gpt 5 Mini
openai/gpt-5-miniOpenaiPaid131 K0.000250.002

No description available.

Full description

No description available.

Gpt 5 Nano
openai/gpt-5-nanoOpenaiPaid131 K0.000050.0004

No description available.

Full description

No description available.

Gpt Audio
openai/gpt-audioOpenaiPaidN/A0.00250.01

No description available.

Full description

No description available.

Gpt Audio 1 5
openai/gpt-audio-1.5OpenaiPaidN/A0.00250.01

No description available.

Full description

No description available.

Gpt Audio Mini
openai/gpt-audio-miniOpenaiPaidN/A0.00060.0024

No description available.

Full description

No description available.

Gpt Image 1
openai/gpt-image-1OpenaiPaidN/A0.0050

No description available.

Full description

No description available.

Gpt Image 1 5
openai/gpt-image-1.5OpenaiPaidN/A0.0050.01

No description available.

Full description

No description available.

Gpt Image 1 Mini
openai/gpt-image-1-miniOpenaiPaidN/A0.0020

No description available.

Full description

No description available.

Gpt Image 2
openai/gpt-image-2OpenaiPaidN/A0.0050

No description available.

Full description

No description available.

GPT Realtime 1.5
openai/gpt-realtime-1.5OpenaiPaidN/A0.0040.016

OpenAI GPT Realtime 1.5 — speech-to-speech real-time model with text, audio, and image input.

Full description

OpenAI GPT Realtime 1.5 — speech-to-speech real-time model with text, audio, and image input.

GPT Realtime 2
openai/gpt-realtime-2OpenaiPaidN/A0.0040.024

OpenAI GPT Realtime 2 — speech-to-speech real-time model with text, audio, and image input.

Full description

OpenAI GPT Realtime 2 — speech-to-speech real-time model with text, audio, and image input.

GPT Realtime Mini
openai/gpt-realtime-miniOpenaiPaidN/A0.00060.0024

OpenAI GPT Realtime Mini — cost-efficient speech-to-speech real-time model.

Full description

OpenAI GPT Realtime Mini — cost-efficient speech-to-speech real-time model.

GPT Realtime Translate
openai/gpt-realtime-translateOpenaiPaidN/APricing unavailablePricing unavailable

OpenAI GPT Realtime Translate — real-time audio translation, billed per minute of output audio.

Full description

OpenAI GPT Realtime Translate — real-time audio translation, billed per minute of output audio.

GPT Realtime Whisper
openai/gpt-realtime-whisperOpenaiPaidN/APricing unavailablePricing unavailable

OpenAI GPT Realtime Whisper — real-time audio transcription, billed per minute of input audio.

Full description

OpenAI GPT Realtime Whisper — real-time audio transcription, billed per minute of input audio.

GPT-5-mini
gpt-5-miniUnknownPaid4 K0.000250.002

GPT-5 mini is a faster, more cost-efficient version of GPT-5. It’s great for well-defined tasks…

Full description

GPT-5 mini is a faster, more cost-efficient version of GPT-5. It’s great for well-defined tasks and precise prompts.

GPT-5.5
openai/gpt-5.5OpenaiPaid1.05 M0.0050.03

GPT-5.5 is OpenAI’s newest frontier model for the most complex professional work. Reasoning.effo…

Full description

GPT-5.5 is OpenAI’s newest frontier model for the most complex professional work. Reasoning.effort supports: none, low, medium (default), high and xhigh.

GPT-OSS-120B
openai/gpt-oss-120bOpenaiPaid131 K0.00010.0005

GPT-OSS-120B — OpenAI open-source 120B model on ModelArk.

Full description

GPT-OSS-120B — OpenAI open-source 120B model on ModelArk.

Grok 4.3
x-ai/grok-4.3X AiPaidN/A0.0030.015

No description available.

Full description

No description available.

IBM: Granite 4.0 Micro
ibm-granite/granite-4.0-h-microIbm GranitePaid131 K0.0000170.00011

Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the…

Full description

Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the latest in a series of models released by IBM. They are fine-tuned for long context tool calling.

Imagen 3
google/imagen-3GooglePaidN/APricing unavailablePricing unavailable

No description available.

Full description

No description available.

Imagen 3 Fast
google/imagen-3-fastGooglePaidN/APricing unavailablePricing unavailable

No description available.

Full description

No description available.

Imagen 3 V1
google/imagen-3-v1GooglePaidN/APricing unavailablePricing unavailable

No description available.

Full description

No description available.

Imagen 4
google/imagen-4GooglePaidN/APricing unavailablePricing unavailable

No description available.

Full description

No description available.

Imagen 4 Fast
google/imagen-4-fastGooglePaidN/APricing unavailablePricing unavailable

No description available.

Full description

No description available.

Imagen 4 Ultra
google/imagen-4-ultraGooglePaidN/APricing unavailablePricing unavailable

No description available.

Full description

No description available.

Inception: Mercury 2
inception/mercury-2InceptionPaid128 K0.000250.00075

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Inst…

Full description

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving…

Inflection: Inflection 3 Productivity
inflection/inflection-3-productivityInflectionPaid8 K0.00250.01

Inflection 3 Productivity is optimized for following instructions. It is better for tasks requir…

Full description

Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. It has access to recent news. For emotional…

Kwaipilot: KAT-Coder-Pro V2
kwaipilot/kat-coder-pro-v2KwaipilotPaid256 K0.00030.0012

KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed fo…

Full description

KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed for complex enterprise-grade software engineering and SaaS integration. It builds on the agentic coding strengths of earlier versions, with a focus on large-scale production environments, multi-system coordination, and seamless integration across modern software stacks, while also supporting web aesthetics generation to produce production-grade landing pages and presentation decks.

Magnum v4 72B
anthracite-org/magnum-v4-72bAnthracite OrgPaid16.4 K0.0030.005

This is a series of models designed to replicate the prose quality of the Claude 3 models, speci…

Full description

This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet) and Opus(https://openrouter.ai/anthropic/claude-3-opus). The model is fine-tuned on top of Qwen2.5 72B.

Mancer: Weaver (alpha)
mancer/weaverMancerPaid8 K0.000750.001

An attempt to recreate Claude-style verbosity, but don’t expect the same level of coherence or m…

Full description

An attempt to recreate Claude-style verbosity, but don’t expect the same level of coherence or memory. Meant for use in roleplay/narrative situations.

Meta: Llama 3 70B Instruct
meta-llama/llama-3-70b-instructMeta LlamaPaid8.19 K0.000720.00072

Meta’s latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B inst…

Full description

Meta’s latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong…

Meta: Llama 3 8B Instruct
meta-llama/llama-3-8b-instructMeta LlamaPaid8.19 K0.00030.0006

Meta’s latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instr…

Full description

Meta’s latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong…

Meta: Llama 3.1 70B Instruct
meta-llama/llama-3.1-70b-instructMeta LlamaPaid131 K0.000720.00072

Meta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B in…

Full description

Meta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong…

Meta: Llama 3.1 8B Instruct
meta-llama/llama-3.1-8b-instructMeta LlamaPaid16.4 K0.00020.0002

Meta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B ins…

Full description

Meta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to…

Meta: Llama 3.2 11B Vision Instruct
meta-llama/llama-3.2-11b-vision-instructMeta LlamaPaid131 K0.0000490.000049

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks…

Full description

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and…

Meta: Llama 3.2 1B Instruct
meta-llama/llama-3.2-1b-instructMeta LlamaPaid60 K0.0000270.0002

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural l…

Full description

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate…

Meta: Llama 3.2 3B Instruct
meta-llama/llama-3.2-3b-instructMeta LlamaPaid80 K0.0000510.00034

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced…

Full description

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it…

Meta: Llama 3.3 70B Instruct
meta-llama/llama-3.3-70b-instructMeta LlamaPaid131 K0.000720.00072

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned…

Full description

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model…

Meta: Llama 4 Maverick
meta-llama/llama-4-maverickMeta LlamaPaid1.05 M0.000240.00097

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, bui…

Full description

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward…

Meta: Llama 4 Scout
meta-llama/llama-4-scoutMeta LlamaPaid328 K0.000170.00017

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta,…

Full description

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input…

Meta: Llama Guard 4 12B
meta-llama/llama-guard-4-12bMeta LlamaPaid164 K0.000180.00018

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content saf…

Full description

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM…

Microsoft: Phi 4
microsoft/phi-4MicrosoftPaid16.4 K0.0000650.00014

Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks an…

Full description

Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion…

Minimax M2
minimax/minimax-m2MinimaxPaid197 K0.00030.0012

No description available.

Full description

No description available.

Minimax M2 1
minimax/minimax-m2-1MinimaxPaid197 K0.00030.0012

No description available.

Full description

No description available.

Minimax M2 5
minimax/minimax-m2-5MinimaxPaid197 K0.00030.0012

No description available.

Full description

No description available.

MiniMax: MiniMax M2-her
minimax/minimax-m2-herMinimaxPaid65.5 K0.00030.0012

MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-…

Full description

MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn conversations. Designed to stay consistent in tone and personality, it supports rich message roles (user_system, group, sample_message_user, sample_message_ai) and can learn from example dialogue to better match the style and pacing of your scenario, making it a strong choice for storytelling, companions, and conversational experiences where natural flow and vivid interaction matter most.

MiniMax: MiniMax-01
minimax/minimax-01MinimaxPaid1 M0.00020.0011

MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image underst…

Full description

MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context…

Mistral Large
mistralai/mistral-largeMistralaiPaid128 K0.0020.006

This is Mistral AI’s flagship model, Mistral Large 2 (version mistral-large-2407). It’s a prop…

Full description

This is Mistral AI’s flagship model, Mistral Large 2 (version mistral-large-2407). It’s a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement here

Mistral Large 2407
mistralai/mistral-large-2407MistralaiPaid131 K0.0020.006

This is Mistral AI’s flagship model, Mistral Large 2 (version mistral-large-2407). It’s a propri…

Full description

This is Mistral AI’s flagship model, Mistral Large 2 (version mistral-large-2407). It’s a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement here

Mistral Large 2411
mistralai/mistral-large-2411MistralaiPaid131 K0.0020.006

Mistral Large 2 2411 is an update of Mistral Large 2 released togeth…

Full description

Mistral Large 2 2411 is an update of Mistral Large 2 released together with Pixtral Large 2411 It provides a significant upgrade on the previous Mistral Large 24.07, with notable…

Mistral: 7B Instruct (legacy)
mistral/mistral-7b-instruct-v0MistralPaid32.8 K0.000150.0002

No description available.

Full description

No description available.

Mistral: Codestral 2508
mistralai/codestral-2508MistralaiPaid256 K0.00030.0009

Mistral’s cutting-edge language model for coding released end of July 2025. Codestral specialize…

Full description

Mistral’s cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation. Blog Post

Mistral: Devstral 2 123B
mistral/devstral-2-123bMistralPaid262 K0.00040.002

No description available.

Full description

No description available.

Mistral: Devstral 2 2512
mistralai/devstral-2512MistralaiPaid262 K0.00040.002

Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding…

Full description

Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context. It tracks framework dependencies, detects failures, and retries with corrections—solving challenges like bug fixing and modernizing legacy systems. The model can be fine-tuned to prioritize specific languages or optimize for large enterprise codebases. It is available under a modified MIT license.

Mistral: Devstral Medium
mistralai/devstral-mediumMistralaiPaid131 K0.00040.002

Devstral Medium is a high-performance code generation and agentic reasoning model developed join…

Full description

Devstral Medium is a high-performance code generation and agentic reasoning model developed jointly by Mistral AI and All Hands AI. Positioned as a step up from Devstral Small, it achieves…

Mistral: Devstral Small 1.1
mistralai/devstral-smallMistralaiPaid131 K0.00010.0003

Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents…

Full description

Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Finetuned from Mistral Small 3.1 and…

Mistral: Large 2402 (legacy)
mistral/mistral-large-2402-v1MistralPaid32.8 K0.0040.012

No description available.

Full description

No description available.

Mistral: Large 3 675B
mistral/mistral-large-3-675b-instructMistralPaid262 K0.00050.0015

No description available.

Full description

No description available.

Mistral: Magistral Small 2509
mistral/magistral-small-2509MistralPaid131 K0.00050.0015

No description available.

Full description

No description available.

Mistral: Ministral 3 14B
mistral/ministral-3-14b-instructMistralPaid131 K0.00020.0002

No description available.

Full description

No description available.

Mistral: Ministral 3 14B 2512
mistralai/ministral-14b-2512MistralaiPaid262 K0.00020.0002

The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and pe…

Full description

The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.

Mistral: Ministral 3 3B
mistral/ministral-3-3b-instructMistralPaid131 K0.00010.0001

No description available.

Full description

No description available.

Mistral: Ministral 3 3B 2512
mistralai/ministral-3b-2512MistralaiPaid131 K0.00010.0001

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny langu…

Full description

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

Mistral: Ministral 3 8B
mistral/ministral-3-8b-instructMistralPaid131 K0.000150.00015

No description available.

Full description

No description available.

Mistral: Ministral 3 8B 2512
mistralai/ministral-8b-2512MistralaiPaid262 K0.000150.00015

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny languag…

Full description

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

Mistral: Mistral 7B Instruct v0.1
mistralai/mistral-7b-instruct-v0.1MistralaiPaid2.82 K0.000110.00019

A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for sp…

Full description

A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.

Mistral: Mistral Large 3 2512
mistralai/mistral-large-2512MistralaiPaid262 K0.00050.0015

Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-expe…

Full description

Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total), and released under the Apache 2.0 license.

Mistral: Mistral Medium 3
mistralai/mistral-medium-3MistralaiPaid131 K0.00040.002

Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver front…

Full description

Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost…

Mistral: Mistral Medium 3.1
mistralai/mistral-medium-3.1MistralaiPaid131 K0.00040.002

Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterp…

Full description

Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases. The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3.1 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments.

Mistral: Mistral Nemo
mistralai/mistral-nemoMistralaiPaid131 K0.000020.00004

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NV…

Full description

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese,…

Mistral: Mistral Small 3
mistralai/mistral-small-24b-instruct-2501MistralaiPaid32.8 K0.000050.00008

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across c…

Full description

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed…

Mistral: Mistral Small 3.1 24B
mistralai/mistral-small-3.1-24b-instructMistralaiPaid131 K0.000030.00011

Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 bi…

Full description

Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and…

Mistral: Mistral Small 3.2 24B
mistralai/mistral-small-3.2-24b-instructMistralaiPaid128 K0.0000750.0002

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for…

Full description

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on…

Mistral: Mistral Small 4
mistralai/mistral-small-2603MistralaiPaid262 K0.000150.0006

Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities…

Full description

Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from Magistral, multimodal understanding from Pixtral, and agentic coding capabilities from Devstral, enabling one model to handle complex analysis, software development, and visual tasks within the same workflow.

Mistral: Mixtral 8x22B Instruct
mistralai/mixtral-8x22b-instructMistralaiPaid65.5 K0.0020.006

Mistral’s official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22…

Full description

Mistral’s official instruct fine-tuned version of Mixtral 8x22B. It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding,…

Mistral: Pixtral Large (2502)
mistral/pixtral-large-2502-v1MistralPaid131 K0.0020.006

No description available.

Full description

No description available.

Mistral: Pixtral Large 2411
mistralai/pixtral-large-2411MistralaiPaid131 K0.0020.006

Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large…

Full description

Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of Mistral Large 2. The model is able to understand documents, charts and natural images. The model is…

Mistral: Saba
mistralai/mistral-sabaMistralaiPaid32.8 K0.00020.0006

Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and Sou…

Full description

Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional…

Mistral: Small 2402 (legacy)
mistral/mistral-small-2402-v1MistralPaid32.8 K0.0010.003

No description available.

Full description

No description available.

Mistral: Voxtral Mini 3B
mistral/voxtral-mini-3b-2507MistralPaid32.8 K0.000040.00004

No description available.

Full description

No description available.

Mistral: Voxtral Small 24B 2507
mistralai/voxtral-small-24b-2507MistralaiPaid32 K0.00010.0003

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input c…

Full description

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Input audio is priced at $100 per million seconds.

MoonshotAI: Kimi K2 0711
moonshotai/kimi-k2MoonshotaiPaid131 K0.000570.0023

Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot…

Full description

Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for…

MoonshotAI: Kimi K2 0905
moonshotai/kimi-k2-0905MoonshotaiPaid131 K0.00040.002

Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale…

Full description

Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k. This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training.

MoonshotAI: Kimi K2 Thinking
moonshotai/kimi-k2-thinkingMoonshotaiPaid131 K0.00060.0025

Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 s…

Full description

Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in…

MoonshotAI: Kimi K2.5
moonshotai/kimi-k2.5MoonshotaiPaid262 K0.00060.003

Kimi K2.5 is Moonshot AI’s native multimodal model, delivering state-of-the-art visual coding ca…

Full description

Kimi K2.5 is Moonshot AI’s native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed…

MoonshotAI: Kimi K2.6
moonshotai/kimi-k2.6MoonshotaiPaid262 K0.0004130.00381

Kimi K2.6 is Moonshot AI’s native multimodal model, delivering state-of-the-art visual coding ca…

Full description

Kimi K2.6 is Moonshot AI’s native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed…

Morph: Morph V3 Fast
morph/morph-v3-fastMorphPaid81.9 K0.00080.0012

Morph’s fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code…

Full description

Morph’s fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code> <update>{edit_snippet}</update>…

Morph: Morph V3 Large
morph/morph-v3-largeMorphPaid262 K0.00090.0019

Morph’s high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy fo…

Full description

Morph’s high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy for precise code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code>…

MythoMax 13B
gryphe/mythomax-l2-13bGryphePaid4.1 K0.000060.00006

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions…

Full description

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge

Nous: Hermes 3 405B Instruct
nousresearch/hermes-3-llama-3.1-405bNousresearchPaid131 K0.0010.001

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced…

Full description

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the…

Nous: Hermes 3 70B Instruct
nousresearch/hermes-3-llama-3.1-70bNousresearchPaid131 K0.00030.0003

Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresea…

Full description

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the…

Nous: Hermes 4 405B
nousresearch/hermes-4-405bNousresearchPaid131 K0.0010.003

Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Rese…

Full description

Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with <think>…</think> traces or respond directly, offering flexibility between speed and depth. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs The model is instruction-tuned with an expanded post-training corpus (~60B tokens) emphasizing reasoning traces, improving performance in math, code, STEM, and logical reasoning, while retaining broad assistant utility. It also supports structured outputs, including JSON mode, schema adherence, function calling, and tool use. Hermes 4 is trained for steerability, lower refusal rates, and alignment toward neutral, user-directed behavior.

Nous: Hermes 4 70B
nousresearch/hermes-4-70bNousresearchPaid131 K0.000130.0004

Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It int…

Full description

Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either respond directly or generate explicit <think>…</think> reasoning traces before answering. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs This 70B variant is trained with the expanded post-training corpus (~60B tokens) emphasizing verified reasoning data, leading to improvements in mathematics, coding, STEM, logic, and structured outputs while maintaining general assistant performance. It supports JSON mode, schema adherence, function calling, and tool use, and is designed for greater steerability with reduced refusal rates.

NVIDIA: Nemotron 3 Nano 30B A3B
nvidia/nemotron-3-nano-30b-a3bNvidiaPaid262 K0.000060.00024

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and…

Full description

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully open with open-weights, datasets and recipes so developers can easily customize, optimize, and deploy the model on their infrastructure for maximum privacy and security.

NVIDIA: Nemotron 3 Super
nvidia/nemotron-3-super-120b-a12bNvidiaPaid262 K0.00010.0005

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameter…

Full description

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Multi-environment RL training across 10+ environments delivers leading accuracy on benchmarks including AIME 2025, TerminalBench, and SWE-Bench Verified. Fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super allows easy customization and secure deployment anywhere — from workstation to cloud.

O1
openai/o1OpenaiPaid2 K0.0150.06

No description available.

Full description

No description available.

O3
openai/o3OpenaiPaid2 K0.0020.008

No description available.

Full description

No description available.

O3 Mini
openai/o3-miniOpenaiPaid2 K0.00110.0044

No description available.

Full description

No description available.

O4 Mini
openai/o4-miniOpenaiPaid2 K0.00110.0044

No description available.

Full description

No description available.

openai/gpt-5-pro
openai/gpt-5-proOpenaiPaid4 K0.0150.12

High-compute version of GPT-5 for complex reasoning tasks

Full description

High-compute version of GPT-5 for complex reasoning tasks

openai/gpt-5.5-pro
openai/gpt-5.5-proOpenaiPaidN/A0.030.18

No description available.

Full description

No description available.

openai/text-embedding-3-large
openai/text-embedding-3-largeOpenaiPaidN/A0.000130

No description available.

Full description

No description available.

openai/text-embedding-3-small
openai/text-embedding-3-smallOpenaiPaidN/A0.000020

No description available.

Full description

No description available.

openai/text-embedding-ada-002
openai/text-embedding-ada-002OpenaiPaidN/A0.00010

No description available.

Full description

No description available.

OpenAI: GPT-3.5 Turbo
openai/gpt-3.5-turboOpenaiPaid16.4 K0.00050.0015

GPT-3.5 Turbo is OpenAI’s fastest model. It can understand and generate natural language or code…

Full description

GPT-3.5 Turbo is OpenAI’s fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.

OpenAI: GPT-3.5 Turbo (older v0613)
openai/gpt-3.5-turbo-0613OpenaiPaid4.09 K0.0010.002

GPT-3.5 Turbo is OpenAI’s fastest model. It can understand and generate natural language or code…

Full description

GPT-3.5 Turbo is OpenAI’s fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.

OpenAI: GPT-3.5 Turbo 16k
openai/gpt-3.5-turbo-16kOpenaiPaid16.4 K0.0030.004

This model offers four times the context length of gpt-3.5-turbo, allowing it to support approxi…

Full description

This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Training data: up…

OpenAI: GPT-3.5 Turbo Instruct
openai/gpt-3.5-turbo-instructOpenaiPaid4.09 K0.00150.002

This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-relat…

Full description

This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Sep 2021.

OpenAI: GPT-4
openai/gpt-4OpenaiPaid8.19 K0.030.06

OpenAI’s flagship model, GPT-4 is a large-scale multimodal language model capable of solving dif…

Full description

OpenAI’s flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due to its broader general knowledge and advanced reasoning…

OpenAI: GPT-4 Turbo
openai/gpt-4-turboOpenaiPaid128 K0.010.03

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and…

Full description

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to December 2023.

OpenAI: GPT-4.1
openai/gpt-4.1OpenaiPaid1.05 M0.0020.008

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-wo…

Full description

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and…

OpenAI: GPT-4.1 Mini
openai/gpt-4.1-miniOpenaiPaid1.05 M0.00040.0016

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantiall…

Full description

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard…

OpenAI: GPT-4.1 Nano
openai/gpt-4.1-nanoOpenaiPaid1.05 M0.00010.0004

For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1…

Full description

For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million…

OpenAI: GPT-4o
openai/gpt-4oOpenaiPaid128 K0.00250.01

GPT-4o (“o” for “omni”) is OpenAI’s latest AI model, supporting both text and image inputs with…

Full description

GPT-4o (“o” for “omni”) is OpenAI’s latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of GPT-4 Turbo while being twice as…

OpenAI: GPT-4o (2024-05-13)
openai/gpt-4o-2024-05-13OpenaiPaid128 K0.0050.015

GPT-4o (“o” for “omni”) is OpenAI’s latest AI model, supporting both text and image inputs with…

Full description

GPT-4o (“o” for “omni”) is OpenAI’s latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of GPT-4 Turbo while being twice as…

OpenAI: GPT-4o (2024-08-06)
openai/gpt-4o-2024-08-06OpenaiPaid128 K0.00250.01

The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the abi…

Full description

The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more here. GPT-4o (“o” for “omni”) is…

OpenAI: GPT-4o (2024-11-20)
openai/gpt-4o-2024-11-20OpenaiPaid128 K0.00250.01

The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural,…

Full description

The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded…

OpenAI: GPT-4o Search Preview
openai/gpt-4o-search-previewOpenaiPaid128 K0.00250.01

GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to…

Full description

GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries.

OpenAI: GPT-4o-mini
openai/gpt-4o-miniOpenaiPaid128 K0.000150.0006

GPT-4o mini is OpenAI’s newest model after GPT-4 Omni, supporting both…

Full description

GPT-4o mini is OpenAI’s newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable…

OpenAI: GPT-4o-mini (2024-07-18)
openai/gpt-4o-mini-2024-07-18OpenaiPaid128 K0.000150.0006

GPT-4o mini is OpenAI’s newest model after GPT-4 Omni, supporting both…

Full description

GPT-4o mini is OpenAI’s newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable…

OpenAI: GPT-4o-mini Search Preview
openai/gpt-4o-mini-search-previewOpenaiPaid128 K0.000150.0006

GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trai…

Full description

GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries.

OpenAI: GPT-5 Chat
openai/gpt-5-chatOpenaiPaid128 K0.001250.01

GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for en…

Full description

GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications.

OpenAI: GPT-5.1
openai/gpt-5.1OpenaiPaid4 K0.001250.01

GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpos…

Full description

GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. The model produces clearer, more grounded explanations with reduced jargon, making it easier to follow even on technical or multi-step problems. Built for broad task coverage, GPT-5.1 delivers consistent gains across math, coding, and structured analysis workloads, with more coherent long-form answers and improved tool-use reliability. It also features refined conversational alignment, enabling warmer, more intuitive responses without compromising precision. GPT-5.1 serves as the primary full-capability successor to GPT-5

OpenAI: GPT-5.1 Chat
openai/gpt-5.1-chatOpenaiPaid128 K0.001250.01

GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-l…

Full description

GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.1 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation.

OpenAI: GPT-5.1-Codex
openai/gpt-5.1-codexOpenaiPaid4 K0.001250.01

GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding…

Full description

GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the reasoning.effort parameter. Read the docs here Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications.

OpenAI: GPT-5.2
openai/gpt-5.2OpenaiPaid4 K0.001750.014

GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and lo…

Full description

GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. Built for broad task coverage, GPT-5.2 delivers consistent gains across math, coding, sciende, and tool calling workloads, with more coherent long-form answers and improved tool-use reliability.

OpenAI: GPT-5.2 Chat
openai/gpt-5.2-chatOpenaiPaid128 K0.001750.014

GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-…

Full description

GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.2 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation.

OpenAI: GPT-5.2 Pro
openai/gpt-5.2-proOpenaiPaid4 K0.0210.168

GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and l…

Full description

GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance over GPT-5 Pro. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like “think hard about this.” Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.

OpenAI: GPT-5.2-Codex
openai/gpt-5.2-codexOpenaiPaid4 K0.001750.014

GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and cod…

Full description

GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1-Codex, 5.2-Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the reasoning.effort parameter. Read the docs here Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications.

OpenAI: GPT-5.3 Chat
openai/gpt-5.3-chatOpenaiPaid128 K0.001750.014

GPT-5.3 Chat is an update to ChatGPT’s most-used model that makes everyday conversations smoothe…

Full description

GPT-5.3 Chat is an update to ChatGPT’s most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly reduces unnecessary refusals, caveats, and overly cautious phrasing that can interrupt conversational flow.

OpenAI: GPT-5.3-Codex
openai/gpt-5.3-codexOpenaiPaid4 K0.001750.014

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software en…

Full description

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results on SWE-Bench Pro and strong performance on Terminal-Bench 2.0 and OSWorld-Verified, reflecting improved multi-language coding, terminal proficiency, and real-world computer-use skills. The model is optimized for long-running, tool-using workflows and supports interactive steering during execution, making it suitable for complex development tasks, debugging, deployment, and iterative product work. Beyond coding, GPT-5.3-Codex performs strongly on structured knowledge-work benchmarks such as GDPval, supporting tasks like document drafting, spreadsheet analysis, slide creation, and operational research across domains. It is trained with enhanced cybersecurity awareness, including vulnerability identification capabilities, and deployed with additional safeguards for high-risk use cases. Compared to prior Codex models, it is more token-efficient and approximately 25% faster, targeting professional end-to-end workflows that span reasoning, execution, and computer interaction.

OpenAI: GPT-5.4
openai/gpt-5.4OpenaiPaid1.05 M0.00250.015

GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system…

Full description

GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs, enabling high-context reasoning, coding, and multimodal analysis within the same workflow. The model delivers improved performance in coding, document understanding, tool use, and instruction following. It is designed as a strong default for both general-purpose tasks and software engineering, capable of generating production-quality code, synthesizing information across multiple sources, and executing complex multi-step workflows with fewer iterations and greater token efficiency.

OpenAI: GPT-5.4 Mini
openai/gpt-5.4-miniOpenaiPaid4 K0.000750.0045

GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized…

Full description

GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It supports text and image inputs with strong performance across reasoning, coding, and tool use, while reducing latency and cost for large-scale deployments. The model is designed for production environments that require a balance of capability and efficiency, making it well suited for chat applications, coding assistants, and agent workflows that operate at scale. GPT-5.4 mini delivers reliable instruction following, solid multi-step reasoning, and consistent performance across diverse tasks with improved cost efficiency.

OpenAI: GPT-5.4 Nano
openai/gpt-5.4-nanoOpenaiPaid4 K0.00020.00125

GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized…

Full description

GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks. It supports text and image inputs and is designed for low-latency use cases such as classification, data extraction, ranking, and sub-agent execution. The model prioritizes responsiveness and efficiency over deep reasoning, making it ideal for pipelines that require fast, reliable outputs at scale. GPT-5.4 nano is well suited for background tasks, real-time systems, and distributed agent architectures where minimizing cost and latency is essential.

OpenAI: GPT-5.4 Pro
openai/gpt-5.4-proOpenaiPaid1.05 M0.030.18

GPT-5.4 Pro is OpenAI’s most advanced model, building on GPT-5.4’s unified architecture with enh…

Full description

GPT-5.4 Pro is OpenAI’s most advanced model, building on GPT-5.4’s unified architecture with enhanced reasoning capabilities for complex, high-stakes tasks. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs. Optimized for step-by-step reasoning, instruction following, and accuracy, GPT-5.4 Pro excels at agentic coding, long-context workflows, and multi-step problem solving.

OpenAI: gpt-oss-safeguard-20b
openai/gpt-oss-safeguard-20bOpenaiPaid131 K0.000090.00039

gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-…

Full description

gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-weight, 21B-parameter Mixture-of-Experts (MoE) model offers lower latency for safety tasks like content classification, LLM filtering, and trust & safety labeling. Learn more about this model in OpenAI’s gpt-oss-safeguard user guide.

OpenAI: o3 Pro
openai/o3-proOpenaiPaid2 K0.020.08

The o-series of models are trained with reinforcement learning to think before they answer and p…

Full description

The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently…

Perplexity: Sonar
perplexity/sonarPerplexityPaid127 K0.0010.001

Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the abil…

Full description

Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the ability to customize sources. It is designed for companies seeking to integrate lightweight question-and-answer features…

Perplexity: Sonar Pro
perplexity/sonar-proPerplexityPaid2 K0.0030.015

Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perp

Full description

Note: Sonar Pro pricing includes Perplexity search pricing. See details here For enterprises seeking more advanced capabilities, the Sonar Pro API can handle in-depth, multi-step queries with added extensibility, like…

Perplexity: Sonar Pro Search
perplexity/sonar-pro-searchPerplexityPaid2 K0.0030.015

Exclusively available on the OpenRouter API, Sonar Pro’s new Pro Search mode is Perplexity’s mos…

Full description

Exclusively available on the OpenRouter API, Sonar Pro’s new Pro Search mode is Perplexity’s most advanced agentic search system. It is designed for deeper reasoning and analysis. Pricing is based on tokens plus $18 per thousand requests. This model powers the Pro Search mode on the Perplexity platform. Sonar Pro Search adds autonomous, multi-step reasoning to Sonar Pro. So, instead of just one query + synthesis, it plans and executes entire research workflows using tools.

Qwen Flash
qwen/qwen-flashQwenPaid1 M0.0000220.000216

No description available.

Full description

No description available.

Qwen Flash 2025 07 28
qwen/qwen-flash-2025-07-28QwenPaid131 K0.0000220.000216

No description available.

Full description

No description available.

Qwen Mt Flash
qwen/qwen-mt-flashQwenPaid131 K0.0001010.00028

No description available.

Full description

No description available.

Qwen Mt Lite
qwen/qwen-mt-liteQwenPaid131 K0.0000860.000229

No description available.

Full description

No description available.

Qwen Mt Plus
qwen/qwen-mt-plusQwenPaid131 K0.0002590.000775

No description available.

Full description

No description available.

Qwen Plus
qwen/qwen-plusQwenPaid1 M0.0001150.000287

No description available.

Full description

No description available.

Qwen Plus 2025 07 28:Non Thinking
qwen/qwen-plus-2025-07-28:non-thinkingQwenPaid131 K0.0001150.000287

No description available.

Full description

No description available.

Qwen Plus 2025 09 11
qwen/qwen-plus-2025-09-11QwenPaid131 K0.0003450.002868

No description available.

Full description

No description available.

Qwen Plus 2025 09 11:Non Thinking
qwen/qwen-plus-2025-09-11:non-thinkingQwenPaid131 K0.0001150.000287

No description available.

Full description

No description available.

Qwen Plus 2025 09 11:Thinking
qwen/qwen-plus-2025-09-11:thinkingQwenPaid131 K0.0001150.001147

No description available.

Full description

No description available.

Qwen Plus 2025 12 01
qwen/qwen-plus-2025-12-01QwenPaid131 K0.0001150.000287

No description available.

Full description

No description available.

Qwen Plus 2025 12 01:Non Thinking
qwen/qwen-plus-2025-12-01:non-thinkingQwenPaid131 K0.0003450.002868

No description available.

Full description

No description available.

Qwen Plus 2025 12 01:Thinking
qwen/qwen-plus-2025-12-01:thinkingQwenPaid131 K0.0001150.001147

No description available.

Full description

No description available.

Qwen Plus:Non Thinking
qwen/qwen-plus:non-thinkingQwenPaid131 K0.0006890.006881

No description available.

Full description

No description available.

Qwen Plus:Thinking
qwen/qwen-plus:thinkingQwenPaid131 K0.0001150.001147

No description available.

Full description

No description available.

qwen/text-embedding-v3
qwen/text-embedding-v3QwenPaidN/A0.000070

No description available.

Full description

No description available.

qwen/text-embedding-v4
qwen/text-embedding-v4QwenPaidN/A0.000070

No description available.

Full description

No description available.

Qwen2.5 72B Instruct
qwen/qwen-2.5-72b-instructQwenPaid32.8 K0.000120.00039

Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following imp…

Full description

Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and…

Qwen2.5 Coder 32B Instruct
qwen/qwen-2.5-coder-32b-instructQwenPaid32.8 K0.000660.001

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known a…

Full description

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in code generation, code reasoning

Qwen3 14B:Non Thinking
qwen/qwen3-14b:non-thinkingQwenPaid131 K0.0001440.000574

No description available.

Full description

No description available.

Qwen3 14B:Thinking
qwen/qwen3-14b:thinkingQwenPaid131 K0.0001440.001434

No description available.

Full description

No description available.

Qwen3 235B A22B Instruct 2507
qwen/qwen3-235b-a22b-instruct-2507QwenPaid131 K0.000230.00092

No description available.

Full description

No description available.

Qwen3 235B A22B:Non Thinking
qwen/qwen3-235b-a22b:non-thinkingQwenPaid131 K0.0002870.001147

No description available.

Full description

No description available.

Qwen3 235B A22B:Thinking
qwen/qwen3-235b-a22b:thinkingQwenPaid131 K0.0002870.002868

No description available.

Full description

No description available.

Qwen3 30B A3B:Non Thinking
qwen/qwen3-30b-a3b:non-thinkingQwenPaid131 K0.0001080.000431

No description available.

Full description

No description available.

Qwen3 30B A3B:Thinking
qwen/qwen3-30b-a3b:thinkingQwenPaid131 K0.0001080.001076

No description available.

Full description

No description available.

Qwen3 32B V1
qwen/qwen3-32b-v1QwenPaid131 K0.00020.0006

No description available.

Full description

No description available.

Qwen3 32B:Non Thinking
qwen/qwen3-32b:non-thinkingQwenPaid131 K0.000160.00064

No description available.

Full description

No description available.

Qwen3 32B:Thinking
qwen/qwen3-32b:thinkingQwenPaid131 K0.000160.00064

No description available.

Full description

No description available.

Qwen3 5 Flash
qwen/qwen3.5-flashQwenPaid1 M0.0000290.000287

No description available.

Full description

No description available.

Qwen3 5 Flash 2026 02 23
qwen/qwen3.5-flash-2026-02-23QwenPaid131 KPricing unavailablePricing unavailable

No description available.

Full description

No description available.

Qwen3 5 Plus
qwen/qwen3.5-plusQwenPaid1 M0.0001150.000688

No description available.

Full description

No description available.

Qwen3 5 Plus 2026 02 15
qwen/qwen3.5-plus-2026-02-15QwenPaid131 K0.0001150.000688

No description available.

Full description

No description available.

Qwen3 6 Plus
qwen/qwen3.6-plusQwenPaid131 K0.0002760.001651

No description available.

Full description

No description available.

Qwen3 6 Plus 2026 04 02
qwen/qwen3.6-plus-2026-04-02QwenPaid131 K0.0002760.001651

No description available.

Full description

No description available.

Qwen3 8B:Non Thinking
qwen/qwen3-8b:non-thinkingQwenPaid131 K0.0000720.000287

No description available.

Full description

No description available.

Qwen3 8B:Thinking
qwen/qwen3-8b:thinkingQwenPaid131 K0.0000720.000717

No description available.

Full description

No description available.

Qwen3 Coder 30B A3B V1
qwen/qwen3-coder-30b-a3b-v1QwenPaid131 K0.000150.00062

No description available.

Full description

No description available.

Qwen3 Coder 480B A35B Instruct
qwen/qwen3-coder-480b-a35b-instructQwenPaid262 K0.0008610.003441

No description available.

Full description

No description available.

Qwen3 Coder Flash 2025 07 28
qwen/qwen3-coder-flash-2025-07-28QwenPaid131 K0.0001440.000574

No description available.

Full description

No description available.

Qwen3 Coder Plus 2025 07 22
qwen/qwen3-coder-plus-2025-07-22QwenPaid131 K0.0005740.002294

No description available.

Full description

No description available.

Qwen3 Coder Plus 2025 09 23
qwen/qwen3-coder-plus-2025-09-23QwenPaid131 K0.0005740.002294

No description available.

Full description

No description available.

Qwen3 Max 2025 09 23
qwen/qwen3-max-2025-09-23QwenPaid131 K0.0008610.003441

No description available.

Full description

No description available.

Qwen3 Max 2026 01 23
qwen/qwen3-max-2026-01-23QwenPaid262 K0.0003590.001434

No description available.

Full description

No description available.

Qwen3 Max Preview
qwen/qwen3-max-previewQwenPaid131 K0.0008610.003441

No description available.

Full description

No description available.

Qwen3 Next 80B A3B
qwen/qwen3-next-80b-a3bQwenPaid131 K0.000150.0012

No description available.

Full description

No description available.

Qwen3 Vl 235B A22B Thinking:Thinking
qwen/qwen3-vl-235b-a22b-thinking:thinkingQwenPaid131 K0.0002870.002868

No description available.

Full description

No description available.

Qwen3 Vl 30B A3B Thinking:Thinking
qwen/qwen3-vl-30b-a3b-thinking:thinkingQwenPaid131 K0.0001080.001076

No description available.

Full description

No description available.

Qwen3 Vl 32B Thinking:Thinking
qwen/qwen3-vl-32b-thinking:thinkingQwenPaid131 K0.000160.00064

No description available.

Full description

No description available.

Qwen3 Vl 8B Thinking:Thinking
qwen/qwen3-vl-8b-thinking:thinkingQwenPaid131 K0.0000720.000717

No description available.

Full description

No description available.

Qwen3 Vl Flash
qwen/qwen3-vl-flashQwenPaid131 K0.0000220.000215

No description available.

Full description

No description available.

Qwen3 Vl Flash 2025 10 15
qwen/qwen3-vl-flash-2025-10-15QwenPaid131 K0.0000220.000215

No description available.

Full description

No description available.

Qwen3 Vl Plus
qwen/qwen3-vl-plusQwenPaid131 K0.0001440.001434

No description available.

Full description

No description available.

Qwen3 Vl Plus 2025 09 23
qwen/qwen3-vl-plus-2025-09-23QwenPaid131 K0.0001440.001434

No description available.

Full description

No description available.

Qwen: Qwen Plus 0728
qwen/qwen-plus-2025-07-28QwenPaid1 M0.0003450.002868

Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning mod…

Full description

Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.

Qwen: Qwen Plus 0728 (thinking)
qwen/qwen-plus-2025-07-28:thinkingQwenPaid1 M0.0001150.001147

Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning mod…

Full description

Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.

Qwen: Qwen2.5 7B Instruct
qwen/qwen-2.5-7b-instructQwenPaid32.8 K0.000040.0001

Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following impr…

Full description

Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and…

Qwen: Qwen3 235B A22B
qwen/qwen3-235b-a22bQwenPaid131 K0.0004550.00182

Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating…

Full description

Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a “thinking” mode for complex reasoning, math, and…

Qwen: Qwen3 235B A22B Instruct 2507
qwen/qwen3-235b-a22b-2507QwenPaid262 K0.0000710.0001

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language m…

Full description

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following,…

Qwen: Qwen3 235B A22B Thinking 2507
qwen/qwen3-235b-a22b-thinking-2507QwenPaid131 K0.00014950.001495

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) langua…

Full description

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144…

Qwen: Qwen3 30B A3B
qwen/qwen3-30b-a3bQwenPaid41 K0.000080.00028

Qwen3, the latest generation in the Qwen large language model series, features both dense and mi…

Full description

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique…

Qwen: Qwen3 30B A3B Instruct 2507
qwen/qwen3-30b-a3b-instruct-2507QwenPaid262 K0.0001080.000431

Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, wi…

Full description

Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. It operates in non-thinking mode and is designed for high-quality instruction following, multilingual understanding, and…

Qwen: Qwen3 32B
qwen/qwen3-32bQwenPaid41 K0.000080.00024

Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for…

Full description

Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a “thinking” mode for…

Qwen: Qwen3 8B
qwen/qwen3-8bQwenPaid41 K0.000050.0004

Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for bot…

Full description

Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between “thinking” mode for math,…

Qwen: Qwen3 Coder 30B A3B Instruct
qwen/qwen3-coder-30b-a3b-instructQwenPaid16 K0.0002160.000861

Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 expert…

Full description

Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the…

Qwen: Qwen3 Coder 480B A35B
qwen/qwen3-coderQwenPaid262 K0.000220.001

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by…

Full description

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over…

Qwen: Qwen3 Coder Flash
qwen/qwen3-coder-flashQwenPaid1 M0.0001440.000574

Qwen3 Coder Flash is Alibaba’s fast and cost efficient version of their proprietary Qwen3 Coder…

Full description

Qwen3 Coder Flash is Alibaba’s fast and cost efficient version of their proprietary Qwen3 Coder Plus. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities.

Qwen: Qwen3 Coder Next
qwen/qwen3-coder-nextQwenPaid262 K0.000120.00075

Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local d…

Full description

Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse MoE design with 80B total parameters and only 3B activated per token, delivering performance comparable to models with 10 to 20x higher active compute, which makes it well suited for cost-sensitive, always-on agent deployment. The model is trained with a strong agentic focus and performs reliably on long-horizon coding tasks, complex tool usage, and recovery from execution failures. With a native 256k context window, it integrates cleanly into real-world CLI and IDE environments and adapts well to common agent scaffolds used by modern coding tools. The model operates exclusively in non-thinking mode and does not emit <think> blocks, simplifying integration for production coding agents.

Qwen: Qwen3 Coder Plus
qwen/qwen3-coder-plusQwenPaid1 M0.0005740.002294

Qwen3 Coder Plus is Alibaba’s proprietary version of the Open Source Qwen3 Coder 480B A35B. It i…

Full description

Qwen3 Coder Plus is Alibaba’s proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities.

Qwen: Qwen3 Max
qwen/qwen3-maxQwenPaid262 K0.0003590.001434

Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reason…

Full description

Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It delivers higher accuracy in math, coding, logic, and science tasks, follows complex instructions in Chinese and English more reliably, reduces hallucinations, and produces higher-quality responses for open-ended Q&A, writing, and conversation. The model supports over 100 languages with stronger translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool calling, though it does not include a dedicated “thinking” mode.

Qwen: Qwen3 Max Thinking
qwen/qwen3-max-thinkingQwenPaid262 K0.000780.0039

Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes…

Full description

Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it delivers major gains in factual accuracy, complex reasoning, instruction following, alignment with human preferences, and agentic behavior.

Qwen: Qwen3 Next 80B A3B Instruct
qwen/qwen3-next-80b-a3b-instructQwenPaid262 K0.0001440.000574

Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimize…

Full description

Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual use, while remaining robust on alignment and formatting. Compared with prior Qwen3 instruct variants, it focuses on higher throughput and stability on ultra-long inputs and multi-turn dialogues, making it well-suited for RAG, tool use, and agentic workflows that require consistent final answers rather than visible chain-of-thought. The model employs scaling-efficient training and decoding to improve parameter efficiency and inference speed, and has been validated on a broad set of public benchmarks where it reaches or approaches larger Qwen3 systems in several categories while outperforming earlier mid-sized baselines. It is best used as a general assistant, code helper, and long-context task solver in production settings where deterministic, instruction-following outputs are preferred.

Qwen: Qwen3 Next 80B A3B Thinking
qwen/qwen3-next-80b-a3b-thinkingQwenPaid131 K0.00009750.00078

Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs…

Full description

Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic planning, and reports strong results across knowledge, reasoning, coding, alignment, and multilingual evaluations. Compared with prior Qwen3 variants, it emphasizes stability under long chains of thought and efficient scaling during inference, and it is tuned to follow complex instructions while reducing repetitive or off-task behavior. The model is suitable for agent frameworks and tool use (function calling), retrieval-heavy workflows, and standardized benchmarking where step-by-step solutions are required. It supports long, detailed completions and leverages throughput-oriented techniques (e.g., multi-token prediction) for faster generation. Note that it operates in thinking-only mode.

Qwen: Qwen3 VL 235B A22B Instruct
qwen/qwen3-vl-235b-a22b-instructQwenPaid262 K0.0002870.001147

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generati…

Full description

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table extraction, multilingual OCR). The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows—turning sketches or mockups into code and assisting with UI debugging—while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.

Qwen: Qwen3 VL 235B A22B Thinking
qwen/qwen3-vl-235b-a22b-thinkingQwenPaid131 K0.000260.0026

Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visua…

Full description

Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math. The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows, turning sketches or mockups into code and assisting with UI debugging, while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.

Qwen: Qwen3 VL 30B A3B Instruct
qwen/qwen3-vl-30b-a3b-instructQwenPaid131 K0.0001080.000431

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual…

Full description

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.

Qwen: Qwen3 VL 30B A3B Thinking
qwen/qwen3-vl-30b-a3b-thinkingQwenPaid131 K0.000130.00156

Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual…

Full description

Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.

Qwen: Qwen3 VL 32B Instruct
qwen/qwen3-vl-32b-instructQwenPaid131 K0.000160.00064

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precis…

Full description

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text comprehension, enabling fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding.Robust OCR in 32 languages, and enhanced multimodal fusion through Interleaved-MRoPE and DeepStack architectures. Optimized for agentic interaction and visual tool use, Qwen3-VL-32B delivers state-of-the-art performance for complex real-world multimodal tasks.

Qwen: Qwen3 VL 8B Instruct
qwen/qwen3-vl-8b-instructQwenPaid131 K0.0000720.000287

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for h…

Full description

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon temporal reasoning, DeepStack for fine-grained visual-text alignment, and text-timestamp alignment for precise event localization. The model supports a native 256K-token context window, extensible to 1M tokens, and handles both static and dynamic media inputs for tasks like document parsing, visual question answering, spatial reasoning, and GUI control. It achieves text understanding comparable to leading LLMs while expanding OCR coverage to 32 languages and enhancing robustness under varied visual conditions.

Qwen: Qwen3 VL 8B Thinking
qwen/qwen3-vl-8b-thinkingQwenPaid131 K0.0001170.001365

Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, des…

Full description

Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and long-context processing (native 256K, expandable to 1M tokens) for tasks such as scientific visual analysis, causal inference, and mathematical reasoning over image or video inputs. Compared to the Instruct edition, the Thinking version introduces deeper visual-language fusion and deliberate reasoning pathways that improve performance on long-chain logic tasks, STEM problem-solving, and multi-step video understanding. It achieves stronger temporal grounding via Interleaved-MRoPE and timestamp-aware embeddings, while maintaining robust OCR, multilingual comprehension, and text generation on par with large text-only LLMs.

Qwen: Qwen3.5 397B A17B
qwen/qwen3.5-397b-a17bQwenPaid262 K0.0001720.001032

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that…

Full description

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers…

Qwen: Qwen3.5 Plus 2026-02-15
qwen/qwen3.5-plus-02-15QwenPaid1 M0.000260.00156

The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that in…

Full description

The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of task evaluations, the 3.5 series consistently demonstrates performance on par with state-of-the-art leading models. Compared to the 3 series, these models show a leap forward in both pure-text and multimodal capabilities.

Qwen: Qwen3.5-122B-A10B
qwen/qwen3.5-122b-a10bQwenPaid262 K0.0001150.000917

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integr…

Full description

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.

Qwen: Qwen3.5-27B
qwen/qwen3.5-27bQwenPaid262 K0.0001950.00156

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, de…

Full description

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.

Qwen: Qwen3.5-35B-A3B
qwen/qwen3.5-35b-a3bQwenPaid262 K0.0000570.000459

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture…

Full description

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall…

Qwen: Qwen3.5-Flash
qwen/qwen3.5-flash-02-23QwenPaid1 M0.0000290.000287

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrat…

Full description

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.

Reka Edge
rekaai/reka-edgeRekaaiPaid16.4 K0.00010.0001

Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video…

Full description

Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding, video analysis, object detection, and agentic tool-use.

Relace: Relace Search
relace/relace-searchRelacePaid256 K0.0010.003

The relace-search model uses 4-12 view_file and grep tools in parallel to explore a codebase…

Full description

The relace-search model uses 4-12 view_file and grep tools in parallel to explore a codebase and return relevant files to the user request. In contrast to RAG, relace-search performs agentic multi-step reasoning to produce highly precise results 4x faster than any frontier model. It’s designed to serve as a subagent that passes its findings to an “oracle” coding agent, who orchestrates/performs the rest of the coding task. To use relace-search you need to build an appropriate agent harness, and parse the response for relevant information to hand off to the oracle. Read more about it in the Relace documentation.

ReMM SLERP 13B
undi95/remm-slerp-l2-13bUndi95Paid6.14 K0.000450.00065

A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge

Full description

A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge

Sao10K: Llama 3 8B Lunaris
sao10k/l3-lunaris-8bSao10KPaid8.19 K0.000040.00005

Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It’s a strategic me…

Full description

Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It’s a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge…

Sao10k: Llama 3 Euryale 70B v2.1
sao10k/l3-euryale-70bSao10KPaid8.19 K0.001480.00148

Euryale 70B v2.1 is a model focused on creative roleplay from Sao10k

Full description

Euryale 70B v2.1 is a model focused on creative roleplay from Sao10k. - Better prompt adherence. - Better anatomy / spatial awareness. - Adapts much better to unique and custom…

Sao10K: Llama 3.1 70B Hanami x1
sao10k/l3.1-70b-hanami-x1Sao10KPaid16 K0.0030.003

This is Sao10K’s experiment over Euryale v2.2.

Full description

This is Sao10K’s experiment over Euryale v2.2.

Sao10K: Llama 3.1 Euryale 70B v2.2
sao10k/l3.1-euryale-70bSao10KPaid131 K0.000850.00085

Euryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sa

Full description

Euryale L3.1 70B v2.2 is a model focused on creative roleplay from Sao10k. It is the successor of Euryale L3 70B v2.1.

Sarvam Bulbul v2
sarvam/bulbul:v2SarvamPaidN/A0.0158N/A

Indian-language TTS, stable. 11 languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-I…

Full description

Indian-language TTS, stable. 11 languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN.

Sarvam Bulbul v3
sarvam/bulbul:v3SarvamPaidN/A0.0316N/A

Indian-language TTS, latest. 11 languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-I…

Full description

Indian-language TTS, latest. 11 languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN.

Sarvam Saaras v3
sarvam/saaras:v3SarvamPaidN/A0.32N/A

Indian-language speech-to-text and speech translation. 23 languages. Supports transcription (mod…

Full description

Indian-language speech-to-text and speech translation. 23 languages. Supports transcription (mode=transcribe) and translation to English (mode=translate).

StepFun: Step 3.5 Flash
stepfun/step-3.5-flashStepfunPaid262 K0.00010.0003

Step 3.5 Flash is StepFun’s most capable open-source foundation model. Built on a sparse Mixture…

Full description

Step 3.5 Flash is StepFun’s most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. It is a reasoning model that is incredibly speed efficient even at long contexts.

Tencent: Hunyuan A13B Instruct
tencent/hunyuan-a13b-instructTencentPaid131 K0.000140.00057

Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tenc…

Full description

Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought. It offers competitive benchmark…

TheDrummer: Cydonia 24B V4.1
thedrummer/cydonia-24b-v4.1ThedrummerPaid131 K0.00030.0005

Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt ad…

Full description

Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt adherence, and intelligence.

TheDrummer: Rocinante 12B
thedrummer/rocinante-12bThedrummerPaid32.8 K0.000170.00043

Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported:…

Full description

Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and expressive word choices - Enhanced creativity for vivid narratives -…

TheDrummer: Skyfall 36B V2
thedrummer/skyfall-36b-v2ThedrummerPaid32.8 K0.000550.0008

Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for impro…

Full description

Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for improved creativity, nuanced writing, role-playing, and coherent storytelling.

TheDrummer: UnslopNemo 12B
thedrummer/unslopnemo-12bThedrummerPaid32.8 K0.00040.0004

UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure wri…

Full description

UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.

vertex/gemini-embedding-001
vertex/gemini-embedding-001VertexPaidN/A0.000150

No description available.

Full description

No description available.

vertex/text-embedding-005
vertex/text-embedding-005VertexPaidN/A0.0000250

No description available.

Full description

No description available.

vertex/text-multilingual-embedding-002
vertex/text-multilingual-embedding-002VertexPaidN/A0.0000250

No description available.

Full description

No description available.

WizardLM-2 8x22B
microsoft/wizardlm-2-8x22bMicrosoftPaid65.5 K0.000620.00062

WizardLM-2 8x22B is Microsoft AI’s most advanced Wizard model. It demonstrates highly competitiv…

Full description

WizardLM-2 8x22B is Microsoft AI’s most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is…

Writer: Palmyra X4
writer/palmyra-x4-v1WriterPaid131 K0.0050.015

No description available.

Full description

No description available.

Writer: Palmyra X5
writer/palmyra-x5WriterPaid1.04 M0.00060.006

Palmyra X5 is Writer’s most advanced model, purpose-built for building and scaling AI agents acr…

Full description

Palmyra X5 is Writer’s most advanced model, purpose-built for building and scaling AI agents across the enterprise. It delivers industry-leading speed and efficiency on context windows up to 1 million tokens, powered by a novel transformer architecture and hybrid attention mechanisms. This enables faster inference and expanded memory for processing large volumes of enterprise data, critical for scaling AI agents.

Writer: Palmyra X5
writer/palmyra-x5-v1WriterPaid1 M0.0060.03

No description available.

Full description

No description available.

xAI: Grok 4.20
x-ai/grok-4.20X AiPaid2 M0.0020.006

Grok 4.20 is xAI’s newest flagship model with industry-leading speed and agentic tool calling ca…

Full description

Grok 4.20 is xAI’s newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently precise and truthful responses. Reasoning can be enabled/disabled using the reasoning enabled parameter in the API. Learn more in our docs

xAI: Grok 4.20 Multi-Agent
x-ai/grok-4.20-multi-agentX AiPaid2 M0.0020.006

Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based wo…

Full description

Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information…

Xiaomi: MiMo-V2-Flash
xiaomi/mimo-v2-flashXiaomiPaid262 K0.000090.00029

MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-o…

Full description

MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters, adopting hybrid attention architecture. MiMo-V2-Flash supports a hybrid-thinking toggle and a 256K context window, and excels at reasoning, coding, and agent scenarios. On SWE-bench Verified and SWE-bench Multilingual, MiMo-V2-Flash ranks as the top #1 open-source model globally, delivering performance comparable to Claude Sonnet 4.5 while costing only about 3.5% as much. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs.

Z.ai: GLM 4 32B
z-ai/glm-4-32bZ AiPaid128 K0.00010.0001

GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex task…

Full description

GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It…

Z.ai: GLM 4.5 Air
z-ai/glm-4.5-airZ AiPaid131 K0.000130.00085

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built f…

Full description

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter…

</tbody> </table> </div>

Page 1 of 1

</>