Best Llm For Coding 2026 Ai Coding Model Rankings Benchmarks
Self-Hosted LLMs — 2026 Rankings Self-Hosted LLM Leaderboard The definitive ranking of self-hostable LLMs for enterprise — compared across quality, speed, hardware requirements, and cost. Find the best open-weight model for your infrastructure.
Roshan Desai · Last updated: 2026-03-24 LLM Leaderboard (All Models) →Open Source LLM Leaderboard →Best LLM for Coding →Calculate hardware requirements →S Kimi K2.5 1T GLM-5 744B MiniMax M2.5 230B Qwen 3.5 397B A DeepSeek R1 671B GLM-4.7 355B Mistral Large 3 675B GPT-oss 120B 117B DeepSeek V3.2 685B Step-3.5-Flash 196B MiMo-V2-Flash 309B Qwen3.5-9B 9B Qwen3.5-4B 4B Qwen3-Coder-Next 80B B Llama 4 Maverick 400B Nemotron Ultra 253B 253B Qwen3-235B-A22B 235B Hunyuan 2.0 406B GPT-oss 20B 20B Llama 4 Scout 109B C Llama 3.3 70B 70B DS-R1-Distill-Llama-70B 70B Qwen 2.5-72B 72B Gemma 3 27B 27B DS-R1-Distill-Qwen-32B 32B Command R+ 104B Devstral-2-123B 123B D Mistral Small 3.1 24B Phi-4 14B Llama 3.1-8B 8B Qwen3-30B-A3B 30B Gemma 3 12B 12B DS-R1-Distill-Qwen-14B 14B DS-R1-Distill-Qwen-7B 7B Phi-4-mini 3.8B Best Self-Hosted LLMs by Task — Benchmark Rankings Which self-hosted model is best for coding, reasoning, or agentic tasks?
See how every open-weight model stacks up — hover any bar for details. Best Advanced Knowledge Advanced knowledge with harder 10-option format (MMLU-Pro) Best in Graduate Reasoning PhD-level science reasoning (GPQA Diamond) Best at Instruction Following Instruction following accuracy (IFEval) Chatbot Arena Rankings Crowdsourced Elo from human preference votes (LMArena) Self-Hosted LLM Benchmark Scores & Hardware Requirements Complete benchmark results, VRAM requirements, and licensing for every major self-hostable LLM. Click any column header to sort and rank.
Filter: VRAM estimates are based on model weight size only: FP16 uses 2 bytes per parameter (e.g. 70B model = 140 GB), INT4 uses 0.5 bytes per parameter (e.g. 70B model = 35 GB). Actual usage is typically 10–20% higher due to KV cache, activations, and framework overhead. Tools like Ollama default to 4-bit quantization, so real-world usage is often closer to the INT4 figure. Compare Self-Hosted LLMs Head-to-Head Select two models to see how they stack up across all benchmarks.
Model A Model B DeepSeek R1 Qwen 3.5 MMLU-Pro 84.0 vs 87.8 GPQA Diamond 71.5 vs 88.4 IFEval 83.3 vs 92.6 Chatbot Arena 1398 vs 1450 SWE-bench Verified 49.2 vs 76.4 LiveCodeBench 65.9 vs 83.6 Benchmarks won 0 vs 6 Deploy These Models with Onyx Onyx is the open-source AI platform that lets you self-host any of these LLMs and connect them to your team's docs, apps, and people.
People Also Asked
- Best LLM for Coding 2026 | AI Coding Model Rankings & Benchmarks
- Best AI for Coding 2026 - Top Coding Models - llm-stats.com
- AI Coding Benchmarks — SWE-bench & LiveCodeBench Leaderboard
- Best LLMs for Coding 2026 — Ranked by Developer Performance
- Best AI for Coding (2026): Every Model Ranked by Real Benchmarks
- Best LLM for Coding in 2026: Ranked by SWE-bench, LCB, and Real-World ...
- Best LLM for Coding (2026): 10 Models Benchmarked and Ranked
Best LLM for Coding 2026 | AI Coding Model Rankings & Benchmarks?
Roshan Desai · Last updated: 2026-03-24 LLM Leaderboard (All Models) →Open Source LLM Leaderboard →Best LLM for Coding →Calculate hardware requirements →S Kimi K2.5 1T GLM-5 744B MiniMax M2.5 230B Qwen 3.5 397B A DeepSeek R1 671B GLM-4.7 355B Mistral Large 3 675B GPT-oss 120B 117B DeepSeek V3.2 685B Step-3.5-Flash 196B MiMo-V2-Flash 309B Qwen3.5-9B 9B Qwen3.5-4B 4B Qwen3-Coder-Next 80B B Llama 4 M...
Best AI for Coding 2026 - Top Coding Models - llm-stats.com?
Roshan Desai · Last updated: 2026-03-24 LLM Leaderboard (All Models) →Open Source LLM Leaderboard →Best LLM for Coding →Calculate hardware requirements →S Kimi K2.5 1T GLM-5 744B MiniMax M2.5 230B Qwen 3.5 397B A DeepSeek R1 671B GLM-4.7 355B Mistral Large 3 675B GPT-oss 120B 117B DeepSeek V3.2 685B Step-3.5-Flash 196B MiMo-V2-Flash 309B Qwen3.5-9B 9B Qwen3.5-4B 4B Qwen3-Coder-Next 80B B Llama 4 M...
AI Coding Benchmarks — SWE-bench & LiveCodeBench Leaderboard?
Model A Model B DeepSeek R1 Qwen 3.5 MMLU-Pro 84.0 vs 87.8 GPQA Diamond 71.5 vs 88.4 IFEval 83.3 vs 92.6 Chatbot Arena 1398 vs 1450 SWE-bench Verified 49.2 vs 76.4 LiveCodeBench 65.9 vs 83.6 Benchmarks won 0 vs 6 Deploy These Models with Onyx Onyx is the open-source AI platform that lets you self-host any of these LLMs and connect them to your team's docs, apps, and people.
Best LLMs for Coding 2026 — Ranked by Developer Performance?
Self-Hosted LLMs — 2026 Rankings Self-Hosted LLM Leaderboard The definitive ranking of self-hostable LLMs for enterprise — compared across quality, speed, hardware requirements, and cost. Find the best open-weight model for your infrastructure.
Best AI for Coding (2026): Every Model Ranked by Real Benchmarks?
Self-Hosted LLMs — 2026 Rankings Self-Hosted LLM Leaderboard The definitive ranking of self-hostable LLMs for enterprise — compared across quality, speed, hardware requirements, and cost. Find the best open-weight model for your infrastructure.