Available Models
Available Models
This page is auto-generated from GET /v1/models so the catalog stays aligned with the live Mesh API inventory.
<>
<div className=“models-table-wrap”> <table className=“models-table”>
<tbody data-model-search-grid>
ai21/jamba-1-5-large-v1No description available.
Full description
No description available.
ai21/jamba-1-5-mini-v1No description available.
Full description
No description available.
ai21/jamba-large-1.7Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding…
Full description
Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall efficiency. Built on a hybrid SSM-Transformer architecture with a 256K context window, it delivers more accurate, contextually grounded responses and better steerability than previous versions.
aion-labs/aion-1.0Aion-1.0 is a multi-model system designed for high performance across various tasks, including r…
Full description
Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree…
aion-labs/aion-1.0-miniAion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for…
Full description
Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, coding, and logic. It is a modified variant…
aion-labs/aion-2.0Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It…
Full description
Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing tension, crises, and conflict into stories, making narratives feel more engaging. It also handles mature and darker themes with more nuance and depth.
aion-labs/aion-rp-llama-3.1-8bAion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto b…
Full description
Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto, where LLMs evaluate each other’s responses. It is a fine-tuned base model…
amazon/nova-2-lite-v1Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process te…
Full description
Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. Nova 2 Lite demonstrates standout capabilities in processing documents, extracting information from videos, generating code, providing accurate grounded answers, and automating multi-step agentic workflows.
amazon/nova-lite-v1No description available.
Full description
No description available.
amazon/nova-micro-v1No description available.
Full description
No description available.
amazon/nova-premier-v1Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning task…
Full description
Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models.
amazon/nova-pro-v1No description available.
Full description
No description available.
anthropic/claude-3.5-haikuClaude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use…
Full description
Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick response times that are essential for dynamic…
anthropic/claude-fable-5Claude Fable 5 is a Mythos-class model from Anthropic, built for autonomous knowledge work and c…
Full description
Claude Fable 5 is a Mythos-class model from Anthropic, built for autonomous knowledge work and coding. It supports text, image, and file inputs with text output, with reasoning support and a 1M-token context window. It is suited for long-running, complex, and asynchronous tasks that previously required frequent human check-ins. It is particularly strong at end-to-end work that would otherwise take a person hours, days, or weeks - taking on problems that are long-running, ambiguous, or highly multi-step. It executes well-scoped tasks with few mistakes, automatically self-correcting through verification loops, and ships with robust safeguards.
anthropic/claude-haiku-4.5Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intel…
Full description
Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications. It introduces extended thinking to the Haiku line; enabling controllable reasoning depth, summarized or interleaved thought output, and tool-assisted workflows with full support for coding, bash, web search, and computer-use tools. Scoring >73% on SWE-bench Verified, Haiku 4.5 ranks among the world’s best coding models while maintaining exceptional responsiveness for sub-agents, parallelized execution, and scaled deployment.
anthropic/claude-opus-4Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sust…
Full description
Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in…
anthropic/claude-opus-4.1Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performan…
Full description
Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on SWE-bench Verified and shows notable gains…
anthropic/claude-opus-4.5Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineeri…
Full description
Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon computer use. It offers strong multimodal capabilities, competitive performance across real-world coding and reasoning benchmarks, and improved robustness to prompt injection. The model is designed to operate efficiently across varied effort levels, enabling developers to trade off speed, depth, and token usage depending on task requirements. It comes with a new parameter to control token efficiency, which can be accessed using the OpenRouter Verbosity parameter with low, medium, or high. Opus 4.5 supports advanced tool use, extended context management, and coordinated multi-agent setups, making it well-suited for autonomous research, debugging, multi-step planning, and spreadsheet/browser manipulation. It delivers substantial gains in structured reasoning, execution reliability, and alignment compared to prior Opus generations, while reducing token overhead and improving performance on long-running tasks.
anthropic/claude-opus-4.6Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is bu…
Full description
Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective for large codebases, complex refactors, and multi-step debugging that unfolds over time. The model shows deeper contextual understanding, stronger problem decomposition, and greater reliability on hard engineering tasks than prior generations. Beyond coding, Opus 4.6 excels at sustained knowledge work. It produces near-production-ready documents, plans, and analyses in a single pass, and maintains coherence across very long outputs and extended sessions. This makes it a strong default for tasks that require persistence, judgment, and follow-through, such as technical design, migration planning, and end-to-end project execution. For users upgrading from earlier Opus versions, see our official migration guide here
anthropic/claude-opus-4.7Anthropic’s latest flagship, now live on Mesh API. Opus 4.7 is a big step up on hard coding work…
Full description
Anthropic’s latest flagship, now live on Mesh API. Opus 4.7 is a big step up on hard coding work, handling long-running agentic tasks with way more rigor and verifying its own outputs before handing them back. Better vision too, it reads high-res images at 3x the fidelity of older Claude models, so computer-use agents and diagram parsing actually work. Improved instruction following, stronger memory across sessions, and a new xhigh effort level for the really gnarly problems.
anthropic/claude-opus-4.8Anthropic’s latest flagship, now live on Mesh API. Opus 4.7 is a big step up on hard coding work…
Full description
Anthropic’s latest flagship, now live on Mesh API. Opus 4.7 is a big step up on hard coding work, handling long-running agentic tasks with way more rigor and verifying its own outputs before handing them back. Better vision too, it reads high-res images at 3x the fidelity of older Claude models, so computer-use agents and diagram parsing actually work. Improved instruction following, stronger memory across sessions, and a new xhigh effort level for the really gnarly problems.
anthropic/claude-sonnet-4.5Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world ag…
Full description
Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with improvements across system design, code security, and specification adherence. The model is designed for extended autonomous operation, maintaining task continuity across sessions and providing fact-based progress tracking. Sonnet 4.5 also introduces stronger agentic capabilities, including improved tool orchestration, speculative parallel execution, and more efficient context and memory management. With enhanced context tracking and awareness of token usage across tool calls, it is particularly well-suited for multi-context and long-running workflows. Use cases span software engineering, cybersecurity, financial analysis, research agents, and other domains requiring sustained reasoning and tool use.
anthropic/claude-sonnet-4.6Sonnet 4.6 is Anthropic’s most capable Sonnet-class model yet, with frontier performance across…
Full description
Sonnet 4.6 is Anthropic’s most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with memory, polished document creation, and confident computer use for web QA and workflow automation.
baidu/ernie-4.5-300b-a47bERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Bai…
Full description
ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4.5 series. It activates 47B parameters per token and supports text generation in…
baidu/ernie-4.5-vl-424b-a47bERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 ser…
Full description
ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data…
bedrock/amazon.titan-embed-g1-text-02No description available.
Full description
No description available.
bedrock/amazon.titan-embed-text-v2:0No description available.
Full description
No description available.
bedrock/cohere.embed-english-v3No description available.
Full description
No description available.
bedrock/cohere.embed-multilingual-v3No description available.
Full description
No description available.
bedrock/cohere.embed-v4:0No description available.
Full description
No description available.
bytedance-seed/seed-1.6Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimo…
Full description
Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context window.
bytedance-seed/seed-1.6-flashSeed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting bot…
Full description
Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context window and can generate outputs of up to 16k tokens.
bytedance-seed/seed-2.0-liteSeed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimoda…
Full description
Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across text, vision, and tools. Engineered for high-frequency visual understanding and agentic workflows, it’s an ideal choice for deployment at scale with minimal latency.
bytedance-seed/seed-2.0-miniSeed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasi…
Full description
Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding, and is optimized for lightweight tasks where cost and speed take priority.
byteplus/seed-1-6ByteDance Seed 1.6.
Full description
ByteDance Seed 1.6.
byteplus/seed-1-6-flashByteDance Seed 1.6 Flash — lightweight fast model.
Full description
ByteDance Seed 1.6 Flash — lightweight fast model.
byteplus/seed-1-8ByteDance Seed 1.8.
Full description
ByteDance Seed 1.8.
byteplus/seedance-1-0-proByteDance Seedance 1.0 Pro — BytePlus video generation model.
Full description
ByteDance Seedance 1.0 Pro — BytePlus video generation model.
byteplus/seedance-1-0-pro-fastByteDance Seedance 1.0 Pro Fast — BytePlus video generation model.
Full description
ByteDance Seedance 1.0 Pro Fast — BytePlus video generation model.
byteplus/seedance-1-5-proByteDance Seedance 1.5 Pro — BytePlus video generation model.
Full description
ByteDance Seedance 1.5 Pro — BytePlus video generation model.
byteplus/seedream-4-0ByteDance Seedream 4.0 — BytePlus image generation model.
Full description
ByteDance Seedream 4.0 — BytePlus image generation model.
byteplus/seedream-4-5ByteDance Seedream 4.5 — BytePlus image generation model.
Full description
ByteDance Seedream 4.5 — BytePlus image generation model.
bytedance/ui-tars-1.5-7bUI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, includin…
Full description
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement…
anthropic/claude-3-haikuNo description available.
Full description
No description available.
anthropic/claude-sonnet-4No description available.
Full description
No description available.
cohere/command-aCommand A is an open-weights 111B parameter model with a 256k context window focused on deliveri…
Full description
Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases. Compared to other leading proprietary…
cohere/command-r-08-2024command-r-08-2024 is an update of the Command R with improved perfor…
Full description
command-r-08-2024 is an update of the Command R with improved performance for multilingual retrieval-augmented generation (RAG) and tool use. More broadly, it is better at math, code and reasoning and…
cohere/command-r-plus-08-2024command-r-plus-08-2024 is an update of the Command R+ with roug…
Full description
command-r-plus-08-2024 is an update of the Command R+ with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint…
cohere/command-r7b-12-2024Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 202…
Full description
Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning…
deepcogito/cogito-v2.1-671bCogito v2.1 671B MoE represents one of the strongest open models globally, matching performance…
Full description
Cogito v2.1 671B MoE represents one of the strongest open models globally, matching performance of frontier closed and open models. This model is trained using self play with reinforcement learning to reach state-of-the-art performance on multiple categories (instruction following, coding, longer queries and creative writing). This advanced system demonstrates significant progress toward scalable superintelligence through policy improvement.
deepseek/deepseek-v3-2DeepSeek V3.2 — improved V3 with long-context support.
Full description
DeepSeek V3.2 — improved V3 with long-context support.
deepseek/deepseek-v4-flashDeepSeek V4 Flash — fast, cost-efficient LLM.
Full description
DeepSeek V4 Flash — fast, cost-efficient LLM.
deepseek/deepseek-v4-proDeepSeek V4 Pro — high-capability LLM.
Full description
DeepSeek V4 Pro — high-capability LLM.
deepseek/deepseek-chat-v3-0324DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship…
Full description
DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 model and performs really well…
deepseek/deepseek-chat-v3.1DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both…
Full description
DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows. It succeeds the DeepSeek V3-0324 model and performs well on a variety of tasks.
deepseek/deepseek-v3.1-terminusDeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that mainta…
Full description
DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model’s original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model’s performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.
deepseek/deepseek-v3.2DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with…
Full description
DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs
deepseek/deepseek-v3.2-expDeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediat…
Full description
DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectures. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve training and inference efficiency in long-context scenarios while maintaining output quality. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs The model was trained under conditions aligned with V3.1-Terminus to enable direct comparison. Benchmarking shows performance roughly on par with V3.1 across reasoning, coding, and agentic tool-use tasks, with minor tradeoffs and gains depending on the domain. This release focuses on validating architectural optimizations for extended context lengths rather than advancing raw task accuracy, making it primarily a research-oriented model for exploring efficient transformer designs.
deepseek/deepseek-r1DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with…
Full description
DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It’s 671B parameters in size, with 37B active in an inference pass…
byteplus/seed-2-0-codeByteDance Seed 2.0 Code — code-focused LLM.
Full description
ByteDance Seed 2.0 Code — code-focused LLM.
byteplus/seed-2-0-liteByteDance Seed 2.0 Lite — multimodal with audio support.
Full description
ByteDance Seed 2.0 Lite — multimodal with audio support.
byteplus/seed-2-0-miniByteDance Seed 2.0 Mini — compact multimodal with audio.
Full description
ByteDance Seed 2.0 Mini — compact multimodal with audio.
byteplus/seed-2-0-proByteDance Seed 2.0 Pro.
Full description
ByteDance Seed 2.0 Pro.
byteplus/seedream-5-0Dola Seedream 5.0 Lite — BytePlus multimodal image generation model.
Full description
Dola Seedream 5.0 Lite — BytePlus multimodal image generation model.
byteplus/dreamina-seedance-2-0Dreamina Seedance 2.0 — BytePlus video generation model.
Full description
Dreamina Seedance 2.0 — BytePlus video generation model.
byteplus/dreamina-seedance-2-0-fastDreamina Seedance 2.0 Fast — BytePlus video generation model.
Full description
Dreamina Seedance 2.0 Fast — BytePlus video generation model.
elevenlabs/eleven_flash_v2_5Ultra-low latency (~75ms) TTS. 32 languages, 40K character limit.
Full description
Ultra-low latency (~75ms) TTS. 32 languages, 40K character limit.
elevenlabs/eleven_multilingual_v2High-quality voice generation. 32 languages, 40K character limit.
Full description
High-quality voice generation. 32 languages, 40K character limit.
elevenlabs/eleven_multilingual_v3Latest high-quality multilingual TTS. 32 languages, 40K character limit.
Full description
Latest high-quality multilingual TTS. 32 languages, 40K character limit.
elevenlabs/scribe_v1Speech-to-text with 98%+ accuracy. 90+ languages, keyterm prompting.
Full description
Speech-to-text with 98%+ accuracy. 90+ languages, keyterm prompting.
elevenlabs/scribe_v2Speech-to-text with 98%+ accuracy. 90+ languages, dynamic audio tagging.
Full description
Speech-to-text with 98%+ accuracy. 90+ languages, dynamic audio tagging.
elevenlabs/scribe_v2_realtimeLow-latency realtime transcription (~150ms). 90+ languages, word-level timestamps.
Full description
Low-latency realtime transcription (~150ms). 90+ languages, word-level timestamps.
elevenlabs/eleven_turbo_v2_5Low-latency TTS optimised for streaming. 32 languages, 40K character limit.
Full description
Low-latency TTS optimised for streaming. 32 languages, 40K character limit.
essentialai/rnj-1-instructRnj-1 is an 8B-parameter, dense, open-weight model family developed by Essential AI and trained…
Full description
Rnj-1 is an 8B-parameter, dense, open-weight model family developed by Essential AI and trained from scratch with a focus on programming, math, and scientific reasoning. The model demonstrates strong performance across multiple programming languages, tool-use workflows, and agentic execution environments (e.g., mini-SWE-agent).
google/gemini-3.1-flash-liteNo description available.
Full description
No description available.
google/gemini-3.1-flash-lite-previewGemini 3.1 Flash Lite Preview is Google’s high-efficiency model optimized for high-volume use ca…
Full description
Gemini 3.1 Flash Lite Preview is Google’s high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across key capabilities. Improvements span audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. Supports full thinking levels (minimal, low, medium, high) for fine-grained cost/performance trade-offs. Priced at half the cost of Gemini 3 Flash.
google/gemini-3.1-pro-previewGemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engine…
Full description
Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation of the Gemini 3 series, it combines high-precision reasoning across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning. The 3.1 update introduces measurable gains in SWE benchmarks and real-world coding environments, along with stronger autonomous task execution in structured domains such as finance and spreadsheet-based workflows. Designed for advanced development and agentic systems, Gemini 3.1 Pro Preview improves long-horizon stability and tool orchestration while increasing token efficiency. It introduces a new medium thinking level to better balance cost, speed, and performance. The model excels in agentic coding, structured planning, multimodal analysis, and workflow automation, making it well-suited for autonomous agents, financial modeling, spreadsheet automation, and high-context enterprise tasks.
google/gemini-3-flash-previewGemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows…
Full description
Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants, making it well suited for interactive development, long running agent loops, and collaborative coding tasks. Compared to Gemini 2.5 Flash, it provides broad quality improvements across reasoning, multimodal understanding, and reliability. The model supports a 1M token context window and multimodal inputs including text, images, audio, video, and PDFs, with text output. It includes configurable reasoning via thinking levels (minimal, low, medium, high), structured output, tool use, and automatic context caching. Gemini 3 Flash Preview is optimized for users who want strong reasoning and agentic behavior without the cost or latency of full scale frontier models.
zai/glm-4-7No description available.
Full description
No description available.
zai/glm-4-7-flashNo description available.
Full description
No description available.
byteplus/glm-4-7GLM-4.7 by Z.AI.
Full description
GLM-4.7 by Z.AI.
google/gemini-2.0-flash-lite-001Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemi…
Full description
Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5,…
google/gemini-2.5-flash-lite-preview-09-2025Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for u…
Full description
Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, “thinking” (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the Reasoning API parameter to selectively trade off cost for intelligence.
google/gemini-2.5-pro-preview-05-06Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, ma…
Full description
Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy…
google/gemini-2.5-pro-previewGemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, ma…
Full description
Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy…
google/gemma-2-27b-itGemma 2 27B by Google is an open model built from the same research and technology used to creat…
Full description
Gemma 2 27B by Google is an open model built from the same research and technology used to create the Gemini models. Gemma models are well-suited for a variety of…
google/gemma-3-12b-itGemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles…
Full description
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,…
google/gemma-3-27b-itGemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles…
Full description
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,…
google/gemma-3-4b-itGemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles…
Full description
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,…
google/gemma-3n-e4b-itGemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as…
Full description
Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks…
google/gemma-4-26b-a4b-itGemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind…
Full description
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at…
google/gemini-2.5-flash-imageGemini 2.5 Flash Image, a.k.a. “Nano Banana,” is now generally available. It is a state of the a…
Full description
Gemini 2.5 Flash Image, a.k.a. “Nano Banana,” is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations. Aspect ratios can be controlled with the image_config API Parameter
google/gemini-3.1-flash-image-previewGemini 3.1 Flash Image Preview, a.k.a. “Nano Banana 2,” is Google’s latest state of the art imag…
Full description
Gemini 3.1 Flash Image Preview, a.k.a. “Nano Banana 2,” is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines advanced contextual understanding with fast, cost-efficient inference, making complex image generation and iterative edits significantly more accessible. Aspect ratios can be controlled with the image_config API Parameter
openai/gpt-3.5-turbo-0125No description available.
Full description
No description available.
openai/gpt-3.5-turbo-1106No description available.
Full description
No description available.
openai/gpt-4-0613No description available.
Full description
No description available.
openai/gpt-4-turbo-2024-04-09No description available.
Full description
No description available.
openai/gpt-5.4-imageNo description available.
Full description
No description available.
openai/gpt-5.4-image-2No description available.
Full description
No description available.
openai/gpt-5.4-image-miniNo description available.
Full description
No description available.
openai/gpt-5-miniNo description available.
Full description
No description available.
openai/gpt-5-nanoNo description available.
Full description
No description available.
openai/gpt-audioNo description available.
Full description
No description available.
openai/gpt-audio-1.5No description available.
Full description
No description available.
openai/gpt-audio-miniNo description available.
Full description
No description available.
openai/gpt-image-1No description available.
Full description
No description available.
openai/gpt-image-1.5No description available.
Full description
No description available.
openai/gpt-image-1-miniNo description available.
Full description
No description available.
openai/gpt-image-2No description available.
Full description
No description available.
openai/gpt-realtime-1.5OpenAI GPT Realtime 1.5 — speech-to-speech real-time model with text, audio, and image input.
Full description
OpenAI GPT Realtime 1.5 — speech-to-speech real-time model with text, audio, and image input.
openai/gpt-realtime-2OpenAI GPT Realtime 2 — speech-to-speech real-time model with text, audio, and image input.
Full description
OpenAI GPT Realtime 2 — speech-to-speech real-time model with text, audio, and image input.
openai/gpt-realtime-miniOpenAI GPT Realtime Mini — cost-efficient speech-to-speech real-time model.
Full description
OpenAI GPT Realtime Mini — cost-efficient speech-to-speech real-time model.
openai/gpt-realtime-translateOpenAI GPT Realtime Translate — real-time audio translation, billed per minute of output audio.
Full description
OpenAI GPT Realtime Translate — real-time audio translation, billed per minute of output audio.
openai/gpt-realtime-whisperOpenAI GPT Realtime Whisper — real-time audio transcription, billed per minute of input audio.
Full description
OpenAI GPT Realtime Whisper — real-time audio transcription, billed per minute of input audio.
gpt-5-miniGPT-5 mini is a faster, more cost-efficient version of GPT-5. It’s great for well-defined tasks…
Full description
GPT-5 mini is a faster, more cost-efficient version of GPT-5. It’s great for well-defined tasks and precise prompts.
openai/gpt-5.5GPT-5.5 is OpenAI’s newest frontier model for the most complex professional work. Reasoning.effo…
Full description
GPT-5.5 is OpenAI’s newest frontier model for the most complex professional work. Reasoning.effort supports: none, low, medium (default), high and xhigh.
openai/gpt-oss-120bGPT-OSS-120B — OpenAI open-source 120B model on ModelArk.
Full description
GPT-OSS-120B — OpenAI open-source 120B model on ModelArk.
x-ai/grok-4.3No description available.
Full description
No description available.
ibm-granite/granite-4.0-h-microGranite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the…
Full description
Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the latest in a series of models released by IBM. They are fine-tuned for long context tool calling.
google/imagen-3No description available.
Full description
No description available.
google/imagen-3-fastNo description available.
Full description
No description available.
google/imagen-3-v1No description available.
Full description
No description available.
google/imagen-4No description available.
Full description
No description available.
google/imagen-4-fastNo description available.
Full description
No description available.
google/imagen-4-ultraNo description available.
Full description
No description available.
inception/mercury-2Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Inst…
Full description
Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving…
inflection/inflection-3-productivityInflection 3 Productivity is optimized for following instructions. It is better for tasks requir…
Full description
Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. It has access to recent news. For emotional…
kwaipilot/kat-coder-pro-v2KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed fo…
Full description
KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed for complex enterprise-grade software engineering and SaaS integration. It builds on the agentic coding strengths of earlier versions, with a focus on large-scale production environments, multi-system coordination, and seamless integration across modern software stacks, while also supporting web aesthetics generation to produce production-grade landing pages and presentation decks.
anthracite-org/magnum-v4-72bThis is a series of models designed to replicate the prose quality of the Claude 3 models, speci…
Full description
This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet) and Opus(https://openrouter.ai/anthropic/claude-3-opus). The model is fine-tuned on top of Qwen2.5 72B.
mancer/weaverAn attempt to recreate Claude-style verbosity, but don’t expect the same level of coherence or m…
Full description
An attempt to recreate Claude-style verbosity, but don’t expect the same level of coherence or memory. Meant for use in roleplay/narrative situations.
meta-llama/llama-3-70b-instructMeta’s latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B inst…
Full description
Meta’s latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong…
meta-llama/llama-3-8b-instructMeta’s latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instr…
Full description
Meta’s latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong…
meta-llama/llama-3.1-70b-instructMeta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B in…
Full description
Meta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong…
meta-llama/llama-3.1-8b-instructMeta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B ins…
Full description
Meta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to…
meta-llama/llama-3.2-11b-vision-instructLlama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks…
Full description
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and…
meta-llama/llama-3.2-1b-instructLlama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural l…
Full description
Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate…
meta-llama/llama-3.2-3b-instructLlama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced…
Full description
Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it…
meta-llama/llama-3.3-70b-instructThe Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned…
Full description
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model…
meta-llama/llama-4-maverickLlama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, bui…
Full description
Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward…
meta-llama/llama-4-scoutLlama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta,…
Full description
Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input…
meta-llama/llama-guard-4-12bLlama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content saf…
Full description
Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM…
microsoft/phi-4Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks an…
Full description
Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion…
minimax/minimax-m2No description available.
Full description
No description available.
minimax/minimax-m2-1No description available.
Full description
No description available.
minimax/minimax-m2-5No description available.
Full description
No description available.
minimax/minimax-m2-herMiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-…
Full description
MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn conversations. Designed to stay consistent in tone and personality, it supports rich message roles (user_system, group, sample_message_user, sample_message_ai) and can learn from example dialogue to better match the style and pacing of your scenario, making it a strong choice for storytelling, companions, and conversational experiences where natural flow and vivid interaction matter most.
minimax/minimax-01MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image underst…
Full description
MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context…
mistralai/mistral-largeThis is Mistral AI’s flagship model, Mistral Large 2 (version mistral-large-2407). It’s a prop…
Full description
This is Mistral AI’s flagship model, Mistral Large 2 (version mistral-large-2407). It’s a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement here…
mistralai/mistral-large-2407This is Mistral AI’s flagship model, Mistral Large 2 (version mistral-large-2407). It’s a propri…
Full description
This is Mistral AI’s flagship model, Mistral Large 2 (version mistral-large-2407). It’s a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement here…
mistralai/mistral-large-2411Mistral Large 2 2411 is an update of Mistral Large 2 released togeth…
Full description
Mistral Large 2 2411 is an update of Mistral Large 2 released together with Pixtral Large 2411 It provides a significant upgrade on the previous Mistral Large 24.07, with notable…
mistral/mistral-7b-instruct-v0No description available.
Full description
No description available.
mistralai/codestral-2508Mistral’s cutting-edge language model for coding released end of July 2025. Codestral specialize…
Full description
Mistral’s cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation. Blog Post
mistral/devstral-2-123bNo description available.
Full description
No description available.
mistralai/devstral-2512Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding…
Full description
Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context. It tracks framework dependencies, detects failures, and retries with corrections—solving challenges like bug fixing and modernizing legacy systems. The model can be fine-tuned to prioritize specific languages or optimize for large enterprise codebases. It is available under a modified MIT license.
mistralai/devstral-mediumDevstral Medium is a high-performance code generation and agentic reasoning model developed join…
Full description
Devstral Medium is a high-performance code generation and agentic reasoning model developed jointly by Mistral AI and All Hands AI. Positioned as a step up from Devstral Small, it achieves…
mistralai/devstral-smallDevstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents…
Full description
Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Finetuned from Mistral Small 3.1 and…
mistral/mistral-large-2402-v1No description available.
Full description
No description available.
mistral/mistral-large-3-675b-instructNo description available.
Full description
No description available.
mistral/magistral-small-2509No description available.
Full description
No description available.
mistral/ministral-3-14b-instructNo description available.
Full description
No description available.
mistralai/ministral-14b-2512The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and pe…
Full description
The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.
mistral/ministral-3-3b-instructNo description available.
Full description
No description available.
mistralai/ministral-3b-2512The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny langu…
Full description
The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.
mistral/ministral-3-8b-instructNo description available.
Full description
No description available.
mistralai/ministral-8b-2512A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny languag…
Full description
A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.
mistralai/mistral-7b-instruct-v0.1A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for sp…
Full description
A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.
mistralai/mistral-large-2512Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-expe…
Full description
Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total), and released under the Apache 2.0 license.
mistralai/mistral-medium-3Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver front…
Full description
Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost…
mistralai/mistral-medium-3.1Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterp…
Full description
Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases. The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3.1 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments.
mistralai/mistral-nemoA 12B parameter model with a 128k token context length built by Mistral in collaboration with NV…
Full description
A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese,…
mistralai/mistral-small-24b-instruct-2501Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across c…
Full description
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed…
mistralai/mistral-small-3.1-24b-instructMistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 bi…
Full description
Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and…
mistralai/mistral-small-3.2-24b-instructMistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for…
Full description
Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on…
mistralai/mistral-small-2603Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities…
Full description
Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from Magistral, multimodal understanding from Pixtral, and agentic coding capabilities from Devstral, enabling one model to handle complex analysis, software development, and visual tasks within the same workflow.
mistralai/mixtral-8x22b-instructMistral’s official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22…
Full description
Mistral’s official instruct fine-tuned version of Mixtral 8x22B. It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding,…
mistral/pixtral-large-2502-v1No description available.
Full description
No description available.
mistralai/pixtral-large-2411Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large…
Full description
Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of Mistral Large 2. The model is able to understand documents, charts and natural images. The model is…
mistralai/mistral-sabaMistral Saba is a 24B-parameter language model specifically designed for the Middle East and Sou…
Full description
Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional…
mistral/mistral-small-2402-v1No description available.
Full description
No description available.
mistral/voxtral-mini-3b-2507No description available.
Full description
No description available.
mistralai/voxtral-small-24b-2507Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input c…
Full description
Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Input audio is priced at $100 per million seconds.
moonshotai/kimi-k2Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot…
Full description
Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for…
moonshotai/kimi-k2-0905Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale…
Full description
Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k. This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training.
moonshotai/kimi-k2-thinkingKimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 s…
Full description
Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in…
moonshotai/kimi-k2.5Kimi K2.5 is Moonshot AI’s native multimodal model, delivering state-of-the-art visual coding ca…
Full description
Kimi K2.5 is Moonshot AI’s native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed…
moonshotai/kimi-k2.6Kimi K2.6 is Moonshot AI’s native multimodal model, delivering state-of-the-art visual coding ca…
Full description
Kimi K2.6 is Moonshot AI’s native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed…
morph/morph-v3-fastMorph’s fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code…
Full description
Morph’s fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code> <update>{edit_snippet}</update>…
morph/morph-v3-largeMorph’s high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy fo…
Full description
Morph’s high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy for precise code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code>…
gryphe/mythomax-l2-13bOne of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions…
Full description
One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
nousresearch/hermes-3-llama-3.1-405bHermes 3 is a generalist language model with many improvements over Hermes 2, including advanced…
Full description
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the…
nousresearch/hermes-3-llama-3.1-70bHermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresea…
Full description
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the…
nousresearch/hermes-4-405bHermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Rese…
Full description
Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with <think>…</think> traces or respond directly, offering flexibility between speed and depth. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs The model is instruction-tuned with an expanded post-training corpus (~60B tokens) emphasizing reasoning traces, improving performance in math, code, STEM, and logical reasoning, while retaining broad assistant utility. It also supports structured outputs, including JSON mode, schema adherence, function calling, and tool use. Hermes 4 is trained for steerability, lower refusal rates, and alignment toward neutral, user-directed behavior.
nousresearch/hermes-4-70bHermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It int…
Full description
Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either respond directly or generate explicit <think>…</think> reasoning traces before answering. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs This 70B variant is trained with the expanded post-training corpus (~60B tokens) emphasizing verified reasoning data, leading to improvements in mathematics, coding, STEM, logic, and structured outputs while maintaining general assistant performance. It supports JSON mode, schema adherence, function calling, and tool use, and is designed for greater steerability with reduced refusal rates.
nvidia/nemotron-3-nano-30b-a3bNVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and…
Full description
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully open with open-weights, datasets and recipes so developers can easily customize, optimize, and deploy the model on their infrastructure for maximum privacy and security.
nvidia/nemotron-3-super-120b-a12bNVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameter…
Full description
NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Multi-environment RL training across 10+ environments delivers leading accuracy on benchmarks including AIME 2025, TerminalBench, and SWE-Bench Verified. Fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super allows easy customization and secure deployment anywhere — from workstation to cloud.
openai/o1No description available.
Full description
No description available.
openai/o3No description available.
Full description
No description available.
openai/o3-miniNo description available.
Full description
No description available.
openai/o4-miniNo description available.
Full description
No description available.
openai/gpt-5-proHigh-compute version of GPT-5 for complex reasoning tasks
Full description
High-compute version of GPT-5 for complex reasoning tasks
openai/gpt-5.5-proNo description available.
Full description
No description available.
openai/text-embedding-3-largeNo description available.
Full description
No description available.
openai/text-embedding-3-smallNo description available.
Full description
No description available.
openai/text-embedding-ada-002No description available.
Full description
No description available.
openai/gpt-3.5-turboGPT-3.5 Turbo is OpenAI’s fastest model. It can understand and generate natural language or code…
Full description
GPT-3.5 Turbo is OpenAI’s fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.
openai/gpt-3.5-turbo-0613GPT-3.5 Turbo is OpenAI’s fastest model. It can understand and generate natural language or code…
Full description
GPT-3.5 Turbo is OpenAI’s fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.
openai/gpt-3.5-turbo-16kThis model offers four times the context length of gpt-3.5-turbo, allowing it to support approxi…
Full description
This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Training data: up…
openai/gpt-3.5-turbo-instructThis model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-relat…
Full description
This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Sep 2021.
openai/gpt-4OpenAI’s flagship model, GPT-4 is a large-scale multimodal language model capable of solving dif…
Full description
OpenAI’s flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due to its broader general knowledge and advanced reasoning…
openai/gpt-4-turboThe latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and…
Full description
The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to December 2023.
openai/gpt-4.1GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-wo…
Full description
GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and…
openai/gpt-4.1-miniGPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantiall…
Full description
GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard…
openai/gpt-4.1-nanoFor tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1…
Full description
For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million…
openai/gpt-4oGPT-4o (“o” for “omni”) is OpenAI’s latest AI model, supporting both text and image inputs with…
Full description
GPT-4o (“o” for “omni”) is OpenAI’s latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of GPT-4 Turbo while being twice as…
openai/gpt-4o-2024-05-13GPT-4o (“o” for “omni”) is OpenAI’s latest AI model, supporting both text and image inputs with…
Full description
GPT-4o (“o” for “omni”) is OpenAI’s latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of GPT-4 Turbo while being twice as…
openai/gpt-4o-2024-08-06The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the abi…
Full description
The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more here. GPT-4o (“o” for “omni”) is…
openai/gpt-4o-2024-11-20The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural,…
Full description
The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded…
openai/gpt-4o-search-previewGPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to…
Full description
GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries.
openai/gpt-4o-miniGPT-4o mini is OpenAI’s newest model after GPT-4 Omni, supporting both…
Full description
GPT-4o mini is OpenAI’s newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable…
openai/gpt-4o-mini-2024-07-18GPT-4o mini is OpenAI’s newest model after GPT-4 Omni, supporting both…
Full description
GPT-4o mini is OpenAI’s newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable…
openai/gpt-4o-mini-search-previewGPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trai…
Full description
GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries.
openai/gpt-5-chatGPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for en…
Full description
GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications.
openai/gpt-5.1GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpos…
Full description
GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. The model produces clearer, more grounded explanations with reduced jargon, making it easier to follow even on technical or multi-step problems. Built for broad task coverage, GPT-5.1 delivers consistent gains across math, coding, and structured analysis workloads, with more coherent long-form answers and improved tool-use reliability. It also features refined conversational alignment, enabling warmer, more intuitive responses without compromising precision. GPT-5.1 serves as the primary full-capability successor to GPT-5
openai/gpt-5.1-chatGPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-l…
Full description
GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.1 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation.
openai/gpt-5.1-codexGPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding…
Full description
GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the reasoning.effort parameter. Read the docs here Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications.
openai/gpt-5.2GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and lo…
Full description
GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly to simple queries while spending more depth on complex tasks. Built for broad task coverage, GPT-5.2 delivers consistent gains across math, coding, sciende, and tool calling workloads, with more coherent long-form answers and improved tool-use reliability.
openai/gpt-5.2-chatGPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-…
Full description
GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.2 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation.
openai/gpt-5.2-proGPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and l…
Full description
GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance over GPT-5 Pro. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like “think hard about this.” Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.
openai/gpt-5.2-codexGPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and cod…
Full description
GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1-Codex, 5.2-Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the reasoning.effort parameter. Read the docs here Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications.
openai/gpt-5.3-chatGPT-5.3 Chat is an update to ChatGPT’s most-used model that makes everyday conversations smoothe…
Full description
GPT-5.3 Chat is an update to ChatGPT’s most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly reduces unnecessary refusals, caveats, and overly cautious phrasing that can interrupt conversational flow.
openai/gpt-5.3-codexGPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software en…
Full description
GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results on SWE-Bench Pro and strong performance on Terminal-Bench 2.0 and OSWorld-Verified, reflecting improved multi-language coding, terminal proficiency, and real-world computer-use skills. The model is optimized for long-running, tool-using workflows and supports interactive steering during execution, making it suitable for complex development tasks, debugging, deployment, and iterative product work. Beyond coding, GPT-5.3-Codex performs strongly on structured knowledge-work benchmarks such as GDPval, supporting tasks like document drafting, spreadsheet analysis, slide creation, and operational research across domains. It is trained with enhanced cybersecurity awareness, including vulnerability identification capabilities, and deployed with additional safeguards for high-risk use cases. Compared to prior Codex models, it is more token-efficient and approximately 25% faster, targeting professional end-to-end workflows that span reasoning, execution, and computer interaction.
openai/gpt-5.4GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system…
Full description
GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs, enabling high-context reasoning, coding, and multimodal analysis within the same workflow. The model delivers improved performance in coding, document understanding, tool use, and instruction following. It is designed as a strong default for both general-purpose tasks and software engineering, capable of generating production-quality code, synthesizing information across multiple sources, and executing complex multi-step workflows with fewer iterations and greater token efficiency.
openai/gpt-5.4-miniGPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized…
Full description
GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It supports text and image inputs with strong performance across reasoning, coding, and tool use, while reducing latency and cost for large-scale deployments. The model is designed for production environments that require a balance of capability and efficiency, making it well suited for chat applications, coding assistants, and agent workflows that operate at scale. GPT-5.4 mini delivers reliable instruction following, solid multi-step reasoning, and consistent performance across diverse tasks with improved cost efficiency.
openai/gpt-5.4-nanoGPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized…
Full description
GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks. It supports text and image inputs and is designed for low-latency use cases such as classification, data extraction, ranking, and sub-agent execution. The model prioritizes responsiveness and efficiency over deep reasoning, making it ideal for pipelines that require fast, reliable outputs at scale. GPT-5.4 nano is well suited for background tasks, real-time systems, and distributed agent architectures where minimizing cost and latency is essential.
openai/gpt-5.4-proGPT-5.4 Pro is OpenAI’s most advanced model, building on GPT-5.4’s unified architecture with enh…
Full description
GPT-5.4 Pro is OpenAI’s most advanced model, building on GPT-5.4’s unified architecture with enhanced reasoning capabilities for complex, high-stakes tasks. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs. Optimized for step-by-step reasoning, instruction following, and accuracy, GPT-5.4 Pro excels at agentic coding, long-context workflows, and multi-step problem solving.
openai/gpt-oss-safeguard-20bgpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-…
Full description
gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-weight, 21B-parameter Mixture-of-Experts (MoE) model offers lower latency for safety tasks like content classification, LLM filtering, and trust & safety labeling. Learn more about this model in OpenAI’s gpt-oss-safeguard user guide.
openai/o3-proThe o-series of models are trained with reinforcement learning to think before they answer and p…
Full description
The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently…
perplexity/sonarSonar is lightweight, affordable, fast, and simple to use — now featuring citations and the abil…
Full description
Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the ability to customize sources. It is designed for companies seeking to integrate lightweight question-and-answer features…
perplexity/sonar-proNote: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perp…
Full description
Note: Sonar Pro pricing includes Perplexity search pricing. See details here For enterprises seeking more advanced capabilities, the Sonar Pro API can handle in-depth, multi-step queries with added extensibility, like…
perplexity/sonar-pro-searchExclusively available on the OpenRouter API, Sonar Pro’s new Pro Search mode is Perplexity’s mos…
Full description
Exclusively available on the OpenRouter API, Sonar Pro’s new Pro Search mode is Perplexity’s most advanced agentic search system. It is designed for deeper reasoning and analysis. Pricing is based on tokens plus $18 per thousand requests. This model powers the Pro Search mode on the Perplexity platform. Sonar Pro Search adds autonomous, multi-step reasoning to Sonar Pro. So, instead of just one query + synthesis, it plans and executes entire research workflows using tools.
qwen/qwen-flashNo description available.
Full description
No description available.
qwen/qwen-flash-2025-07-28No description available.
Full description
No description available.
qwen/qwen-mt-flashNo description available.
Full description
No description available.
qwen/qwen-mt-liteNo description available.
Full description
No description available.
qwen/qwen-mt-plusNo description available.
Full description
No description available.
qwen/qwen-plusNo description available.
Full description
No description available.
qwen/qwen-plus-2025-07-28:non-thinkingNo description available.
Full description
No description available.
qwen/qwen-plus-2025-09-11No description available.
Full description
No description available.
qwen/qwen-plus-2025-09-11:non-thinkingNo description available.
Full description
No description available.
qwen/qwen-plus-2025-09-11:thinkingNo description available.
Full description
No description available.
qwen/qwen-plus-2025-12-01No description available.
Full description
No description available.
qwen/qwen-plus-2025-12-01:non-thinkingNo description available.
Full description
No description available.
qwen/qwen-plus-2025-12-01:thinkingNo description available.
Full description
No description available.
qwen/qwen-plus:non-thinkingNo description available.
Full description
No description available.
qwen/qwen-plus:thinkingNo description available.
Full description
No description available.
qwen/text-embedding-v3No description available.
Full description
No description available.
qwen/text-embedding-v4No description available.
Full description
No description available.
qwen/qwen-2.5-72b-instructQwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following imp…
Full description
Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and…
qwen/qwen-2.5-coder-32b-instructQwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known a…
Full description
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in code generation, code reasoning…
qwen/qwen3-14b:non-thinkingNo description available.
Full description
No description available.
qwen/qwen3-14b:thinkingNo description available.
Full description
No description available.
qwen/qwen3-235b-a22b-instruct-2507No description available.
Full description
No description available.
qwen/qwen3-235b-a22b:non-thinkingNo description available.
Full description
No description available.
qwen/qwen3-235b-a22b:thinkingNo description available.
Full description
No description available.
qwen/qwen3-30b-a3b:non-thinkingNo description available.
Full description
No description available.
qwen/qwen3-30b-a3b:thinkingNo description available.
Full description
No description available.
qwen/qwen3-32b-v1No description available.
Full description
No description available.
qwen/qwen3-32b:non-thinkingNo description available.
Full description
No description available.
qwen/qwen3-32b:thinkingNo description available.
Full description
No description available.
qwen/qwen3.5-flashNo description available.
Full description
No description available.
qwen/qwen3.5-flash-2026-02-23No description available.
Full description
No description available.
qwen/qwen3.5-plusNo description available.
Full description
No description available.
qwen/qwen3.5-plus-2026-02-15No description available.
Full description
No description available.
qwen/qwen3.6-plusNo description available.
Full description
No description available.
qwen/qwen3.6-plus-2026-04-02No description available.
Full description
No description available.
qwen/qwen3-8b:non-thinkingNo description available.
Full description
No description available.
qwen/qwen3-8b:thinkingNo description available.
Full description
No description available.
qwen/qwen3-coder-30b-a3b-v1No description available.
Full description
No description available.
qwen/qwen3-coder-480b-a35b-instructNo description available.
Full description
No description available.
qwen/qwen3-coder-flash-2025-07-28No description available.
Full description
No description available.
qwen/qwen3-coder-plus-2025-07-22No description available.
Full description
No description available.
qwen/qwen3-coder-plus-2025-09-23No description available.
Full description
No description available.
qwen/qwen3-max-2025-09-23No description available.
Full description
No description available.
qwen/qwen3-max-2026-01-23No description available.
Full description
No description available.
qwen/qwen3-max-previewNo description available.
Full description
No description available.
qwen/qwen3-next-80b-a3bNo description available.
Full description
No description available.
qwen/qwen3-vl-235b-a22b-thinking:thinkingNo description available.
Full description
No description available.
qwen/qwen3-vl-30b-a3b-thinking:thinkingNo description available.
Full description
No description available.
qwen/qwen3-vl-32b-thinking:thinkingNo description available.
Full description
No description available.
qwen/qwen3-vl-8b-thinking:thinkingNo description available.
Full description
No description available.
qwen/qwen3-vl-flashNo description available.
Full description
No description available.
qwen/qwen3-vl-flash-2025-10-15No description available.
Full description
No description available.
qwen/qwen3-vl-plusNo description available.
Full description
No description available.
qwen/qwen3-vl-plus-2025-09-23No description available.
Full description
No description available.
qwen/qwen-plus-2025-07-28Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning mod…
Full description
Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.
qwen/qwen-plus-2025-07-28:thinkingQwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning mod…
Full description
Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.
qwen/qwen-2.5-7b-instructQwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following impr…
Full description
Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and…
qwen/qwen3-235b-a22bQwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating…
Full description
Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a “thinking” mode for complex reasoning, math, and…
qwen/qwen3-235b-a22b-2507Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language m…
Full description
Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following,…
qwen/qwen3-235b-a22b-thinking-2507Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) langua…
Full description
Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144…
qwen/qwen3-30b-a3bQwen3, the latest generation in the Qwen large language model series, features both dense and mi…
Full description
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique…
qwen/qwen3-30b-a3b-instruct-2507Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, wi…
Full description
Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. It operates in non-thinking mode and is designed for high-quality instruction following, multilingual understanding, and…
qwen/qwen3-32bQwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for…
Full description
Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a “thinking” mode for…
qwen/qwen3-8bQwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for bot…
Full description
Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between “thinking” mode for math,…
qwen/qwen3-coder-30b-a3b-instructQwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 expert…
Full description
Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the…
qwen/qwen3-coderQwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by…
Full description
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over…
qwen/qwen3-coder-flashQwen3 Coder Flash is Alibaba’s fast and cost efficient version of their proprietary Qwen3 Coder…
Full description
Qwen3 Coder Flash is Alibaba’s fast and cost efficient version of their proprietary Qwen3 Coder Plus. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities.
qwen/qwen3-coder-nextQwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local d…
Full description
Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse MoE design with 80B total parameters and only 3B activated per token, delivering performance comparable to models with 10 to 20x higher active compute, which makes it well suited for cost-sensitive, always-on agent deployment. The model is trained with a strong agentic focus and performs reliably on long-horizon coding tasks, complex tool usage, and recovery from execution failures. With a native 256k context window, it integrates cleanly into real-world CLI and IDE environments and adapts well to common agent scaffolds used by modern coding tools. The model operates exclusively in non-thinking mode and does not emit <think> blocks, simplifying integration for production coding agents.
qwen/qwen3-coder-plusQwen3 Coder Plus is Alibaba’s proprietary version of the Open Source Qwen3 Coder 480B A35B. It i…
Full description
Qwen3 Coder Plus is Alibaba’s proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities.
qwen/qwen3-maxQwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reason…
Full description
Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It delivers higher accuracy in math, coding, logic, and science tasks, follows complex instructions in Chinese and English more reliably, reduces hallucinations, and produces higher-quality responses for open-ended Q&A, writing, and conversation. The model supports over 100 languages with stronger translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool calling, though it does not include a dedicated “thinking” mode.
qwen/qwen3-max-thinkingQwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes…
Full description
Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it delivers major gains in factual accuracy, complex reasoning, instruction following, alignment with human preferences, and agentic behavior.
qwen/qwen3-next-80b-a3b-instructQwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimize…
Full description
Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual use, while remaining robust on alignment and formatting. Compared with prior Qwen3 instruct variants, it focuses on higher throughput and stability on ultra-long inputs and multi-turn dialogues, making it well-suited for RAG, tool use, and agentic workflows that require consistent final answers rather than visible chain-of-thought. The model employs scaling-efficient training and decoding to improve parameter efficiency and inference speed, and has been validated on a broad set of public benchmarks where it reaches or approaches larger Qwen3 systems in several categories while outperforming earlier mid-sized baselines. It is best used as a general assistant, code helper, and long-context task solver in production settings where deterministic, instruction-following outputs are preferred.
qwen/qwen3-next-80b-a3b-thinkingQwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs…
Full description
Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic planning, and reports strong results across knowledge, reasoning, coding, alignment, and multilingual evaluations. Compared with prior Qwen3 variants, it emphasizes stability under long chains of thought and efficient scaling during inference, and it is tuned to follow complex instructions while reducing repetitive or off-task behavior. The model is suitable for agent frameworks and tool use (function calling), retrieval-heavy workflows, and standardized benchmarking where step-by-step solutions are required. It supports long, detailed completions and leverages throughput-oriented techniques (e.g., multi-token prediction) for faster generation. Note that it operates in thinking-only mode.
qwen/qwen3-vl-235b-a22b-instructQwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generati…
Full description
Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table extraction, multilingual OCR). The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows—turning sketches or mockups into code and assisting with UI debugging—while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.
qwen/qwen3-vl-235b-a22b-thinkingQwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visua…
Full description
Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math. The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows, turning sketches or mockups into code and assisting with UI debugging, while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.
qwen/qwen3-vl-30b-a3b-instructQwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual…
Full description
Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.
qwen/qwen3-vl-30b-a3b-thinkingQwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual…
Full description
Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.
qwen/qwen3-vl-32b-instructQwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precis…
Full description
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text comprehension, enabling fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding.Robust OCR in 32 languages, and enhanced multimodal fusion through Interleaved-MRoPE and DeepStack architectures. Optimized for agentic interaction and visual tool use, Qwen3-VL-32B delivers state-of-the-art performance for complex real-world multimodal tasks.
qwen/qwen3-vl-8b-instructQwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for h…
Full description
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon temporal reasoning, DeepStack for fine-grained visual-text alignment, and text-timestamp alignment for precise event localization. The model supports a native 256K-token context window, extensible to 1M tokens, and handles both static and dynamic media inputs for tasks like document parsing, visual question answering, spatial reasoning, and GUI control. It achieves text understanding comparable to leading LLMs while expanding OCR coverage to 32 languages and enhancing robustness under varied visual conditions.
qwen/qwen3-vl-8b-thinkingQwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, des…
Full description
Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and long-context processing (native 256K, expandable to 1M tokens) for tasks such as scientific visual analysis, causal inference, and mathematical reasoning over image or video inputs. Compared to the Instruct edition, the Thinking version introduces deeper visual-language fusion and deliberate reasoning pathways that improve performance on long-chain logic tasks, STEM problem-solving, and multi-step video understanding. It achieves stronger temporal grounding via Interleaved-MRoPE and timestamp-aware embeddings, while maintaining robust OCR, multilingual comprehension, and text generation on par with large text-only LLMs.
qwen/qwen3.5-397b-a17bThe Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that…
Full description
The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers…
qwen/qwen3.5-plus-02-15The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that in…
Full description
The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of task evaluations, the 3.5 series consistently demonstrates performance on par with state-of-the-art leading models. Compared to the 3 series, these models show a leap forward in both pure-text and multimodal capabilities.
qwen/qwen3.5-122b-a10bThe Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integr…
Full description
The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.
qwen/qwen3.5-27bThe Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, de…
Full description
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.
qwen/qwen3.5-35b-a3bThe Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture…
Full description
The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall…
qwen/qwen3.5-flash-02-23The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrat…
Full description
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.
rekaai/reka-edgeReka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video…
Full description
Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding, video analysis, object detection, and agentic tool-use.
relace/relace-searchThe relace-search model uses 4-12 view_file and grep tools in parallel to explore a codebase…
Full description
The relace-search model uses 4-12 view_file and grep tools in parallel to explore a codebase and return relevant files to the user request. In contrast to RAG, relace-search performs agentic multi-step reasoning to produce highly precise results 4x faster than any frontier model. It’s designed to serve as a subagent that passes its findings to an “oracle” coding agent, who orchestrates/performs the rest of the coding task. To use relace-search you need to build an appropriate agent harness, and parse the response for relevant information to hand off to the oracle. Read more about it in the Relace documentation.
undi95/remm-slerp-l2-13bA recreation trial of the original MythoMax-L2-B13 but with updated models. #merge
Full description
A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge
sao10k/l3-lunaris-8bLunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It’s a strategic me…
Full description
Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It’s a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge…
sao10k/l3-euryale-70bEuryale 70B v2.1 is a model focused on creative roleplay from Sao10k…
Full description
Euryale 70B v2.1 is a model focused on creative roleplay from Sao10k. - Better prompt adherence. - Better anatomy / spatial awareness. - Adapts much better to unique and custom…
sao10k/l3.1-70b-hanami-x1This is Sao10K’s experiment over Euryale v2.2.
Full description
This is Sao10K’s experiment over Euryale v2.2.
sao10k/l3.1-euryale-70bEuryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sa…
Full description
Euryale L3.1 70B v2.2 is a model focused on creative roleplay from Sao10k. It is the successor of Euryale L3 70B v2.1.
sarvam/bulbul:v2Indian-language TTS, stable. 11 languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-I…
Full description
Indian-language TTS, stable. 11 languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN.
sarvam/bulbul:v3Indian-language TTS, latest. 11 languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-I…
Full description
Indian-language TTS, latest. 11 languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN.
sarvam/saaras:v3Indian-language speech-to-text and speech translation. 23 languages. Supports transcription (mod…
Full description
Indian-language speech-to-text and speech translation. 23 languages. Supports transcription (mode=transcribe) and translation to English (mode=translate).
stepfun/step-3.5-flashStep 3.5 Flash is StepFun’s most capable open-source foundation model. Built on a sparse Mixture…
Full description
Step 3.5 Flash is StepFun’s most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. It is a reasoning model that is incredibly speed efficient even at long contexts.
tencent/hunyuan-a13b-instructHunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tenc…
Full description
Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought. It offers competitive benchmark…
thedrummer/cydonia-24b-v4.1Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt ad…
Full description
Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt adherence, and intelligence.
thedrummer/rocinante-12bRocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported:…
Full description
Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and expressive word choices - Enhanced creativity for vivid narratives -…
thedrummer/skyfall-36b-v2Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for impro…
Full description
Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for improved creativity, nuanced writing, role-playing, and coherent storytelling.
thedrummer/unslopnemo-12bUnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure wri…
Full description
UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.
vertex/gemini-embedding-001No description available.
Full description
No description available.
vertex/text-embedding-005No description available.
Full description
No description available.
vertex/text-multilingual-embedding-002No description available.
Full description
No description available.
microsoft/wizardlm-2-8x22bWizardLM-2 8x22B is Microsoft AI’s most advanced Wizard model. It demonstrates highly competitiv…
Full description
WizardLM-2 8x22B is Microsoft AI’s most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is…
writer/palmyra-x4-v1No description available.
Full description
No description available.
writer/palmyra-x5Palmyra X5 is Writer’s most advanced model, purpose-built for building and scaling AI agents acr…
Full description
Palmyra X5 is Writer’s most advanced model, purpose-built for building and scaling AI agents across the enterprise. It delivers industry-leading speed and efficiency on context windows up to 1 million tokens, powered by a novel transformer architecture and hybrid attention mechanisms. This enables faster inference and expanded memory for processing large volumes of enterprise data, critical for scaling AI agents.
writer/palmyra-x5-v1No description available.
Full description
No description available.
x-ai/grok-4.20Grok 4.20 is xAI’s newest flagship model with industry-leading speed and agentic tool calling ca…
Full description
Grok 4.20 is xAI’s newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently precise and truthful responses. Reasoning can be enabled/disabled using the reasoning enabled parameter in the API. Learn more in our docs
x-ai/grok-4.20-multi-agentGrok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based wo…
Full description
Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information…
xiaomi/mimo-v2-flashMiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-o…
Full description
MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters, adopting hybrid attention architecture. MiMo-V2-Flash supports a hybrid-thinking toggle and a 256K context window, and excels at reasoning, coding, and agent scenarios. On SWE-bench Verified and SWE-bench Multilingual, MiMo-V2-Flash ranks as the top #1 open-source model globally, delivering performance comparable to Claude Sonnet 4.5 while costing only about 3.5% as much. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs.
z-ai/glm-4-32bGLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex task…
Full description
GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It…
z-ai/glm-4.5-airGLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built f…
Full description
GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter…
</tbody> </table> </div>
Page 1 of 1
</>