Best models for this Mac
Mac Studio M3 Ultra 256GB ranked for coding with a most capable bias, using the best available runtime evidence.
| Rank | Model | Score | Quant | Tok/s | Runtime | Evidence | Headroom | Why it ranks here |
|---|---|---|---|---|---|---|---|---|
| 1 | Qwen 3 32B | 274 | 8bit | 22.0 tok/s | Ollama | Estimated | 223.0 GB | 8bit is the highest practical quality here. 22.0 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 223.0 GB headroom leaves workable context margin. |
| 2 | Qwen3.5-27B | 266 | 8bit | 38.0 tok/s | MLX | Estimated | 228.4 GB | 8bit is the highest practical quality here. 38.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 228.4 GB headroom leaves workable context margin. |
| 3 | Devstral Small 2 24B | 262 | 8bit | 47.0 tok/s | MLX | Estimated | 231.9 GB | 8bit is the highest practical quality here. 47.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 231.9 GB headroom leaves workable context margin. |
| 4 | Devstral Small 1.1 | 262 | 8bit | 43.0 tok/s | LM Studio | Estimated | 231.9 GB | 8bit is the highest practical quality here. 43.0 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 231.9 GB headroom leaves workable context margin. |
| 5 | Llama 3.3 70B | 261 | 8bit | 11.8 tok/s | LM Studio | Estimated | 187.2 GB | 8bit is the highest practical quality here. 11.8 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 187.2 GB headroom leaves workable context margin. |
| 6 | Gemma 3 27B | 259 | 8bit | 14.5 tok/s | LM Studio | Estimated | 226.1 GB | 8bit is the highest practical quality here. 14.5 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 226.1 GB headroom leaves workable context margin. |
| 7 | DeepSeek R1 Distill Qwen 32B | 252 | 8bit | Measure it | Best available | Fit-first | 223.0 GB | 8bit is the highest practical quality here. Speed still needs direct measurement. 223.0 GB headroom leaves workable context margin. |
| 8 | Mistral Small 3.1 24B | 240 | 8bit | Measure it | Best available | Fit-first | 231.9 GB | 8bit is the highest practical quality here. Speed still needs direct measurement. 231.9 GB headroom leaves workable context margin. |
| 9 | Magistral Small | 240 | 8bit | Measure it | Best available | Fit-first | 231.9 GB | 8bit is the highest practical quality here. Speed still needs direct measurement. 231.9 GB headroom leaves workable context margin. |
| 10 | Nemotron-3-Nano-30B-A3B | 233 | 8bit | 43.7 tok/s | llama.cpp | Estimated | 227.2 GB | 8bit is the highest practical quality here. 43.7 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 227.2 GB headroom leaves workable context margin. |
| 11 | Qwen 3 30B-A3B | 233 | 8bit | 76.7 tok/s | MLX | Estimated | 226.3 GB | 8bit is the highest practical quality here. 76.7 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 226.3 GB headroom leaves workable context margin. |
| 12 | Qwen3-Coder-30B-A3B | 233 | 8bit | 58.5 tok/s | llama.cpp | Estimated | 226.7 GB | 8bit is the highest practical quality here. 58.5 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 226.7 GB headroom leaves workable context margin. |
| 13 | Qwen3.5-35B-A3B | 232 | 8bit | 80.0 tok/s | MLX | Measured | 222.3 GB | 8bit is the highest practical quality here. 80.0 tok/s measured on MLX backend. 222.3 GB headroom leaves workable context margin. |
| 14 | GLM-4.7-Flash | 232 | 8bit | 36.8 tok/s | llama.cpp | Estimated | 220.2 GB | 8bit is the highest practical quality here. 36.8 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 220.2 GB headroom leaves workable context margin. |
| 15 | Qwen3.5-9B | 213 | 8bit | 106.0 tok/s | MLX | Estimated | 246.1 GB | 8bit is the highest practical quality here. 106.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 246.1 GB headroom leaves workable context margin. |
| 16 | Qwen 3 8B | 211 | 8bit | 63.1 tok/s | LM Studio | Estimated | 246.7 GB | 8bit is the highest practical quality here. 63.1 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 246.7 GB headroom leaves workable context margin. |
| 17 | Llama 2 7B | 209 | 8bit | 36.4 tok/s | llama.cpp | Estimated | 245.2 GB | 8bit is the highest practical quality here. 36.4 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 245.2 GB headroom leaves workable context margin. |
| 18 | Qwen 2.5 14B | 199 | 8bit | Measure it | Best available | Fit-first | 241.0 GB | 8bit is the highest practical quality here. Speed still needs direct measurement. 241.0 GB headroom leaves workable context margin. |
| 19 | Phi-4 14B | 198 | 8bit | Measure it | Best available | Fit-first | 241.5 GB | 8bit is the highest practical quality here. Speed still needs direct measurement. 241.5 GB headroom leaves workable context margin. |
| 20 | Llama 3.1 8B | 189 | 8bit | Measure it | Best available | Fit-first | 247.0 GB | 8bit is the highest practical quality here. Speed still needs direct measurement. 247.0 GB headroom leaves workable context margin. |
| 21 | GLM-4.5-Air | 189 | 8bit | 54.0 tok/s | MLX | Estimated | 155.3 GB | 8bit is the highest practical quality here. 54.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 155.3 GB headroom leaves workable context margin. |
| 22 | Qwen 2.5 7B | 189 | 8bit | Measure it | Best available | Fit-first | 248.0 GB | 8bit is the highest practical quality here. Speed still needs direct measurement. 248.0 GB headroom leaves workable context margin. |
| 23 | Mistral 7B v0.3 | 188 | 8bit | Measure it | Best available | Fit-first | 247.7 GB | 8bit is the highest practical quality here. Speed still needs direct measurement. 247.7 GB headroom leaves workable context margin. |
| 24 | Qwen3.5-122B-A10B | 186 | 8bit | 43.0 tok/s | MLX | Measured | 141.1 GB | 8bit is the highest practical quality here. 43.0 tok/s measured on MLX backend. 141.1 GB headroom leaves workable context margin. |
| 25 | Qwen3-Coder-Next | 176 | 8bit | 74.0 tok/s | MLX | Estimated | 180.2 GB | 8bit is the highest practical quality here. 74.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 180.2 GB headroom leaves workable context margin. |
| 26 | Gemma 3 4B | 150 | 8bit | 100.5 tok/s | LM Studio | Estimated | 250.4 GB | 8bit is the highest practical quality here. 100.5 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 250.4 GB headroom leaves workable context margin. |
| 27 | Qwen 3 4B | 150 | 8bit | 143.2 tok/s | MLX | Estimated | 250.6 GB | 8bit is the highest practical quality here. 143.2 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 250.6 GB headroom leaves workable context margin. |
| 28 | Qwen 3 0.6B | 145 | 8bit | 184.4 tok/s | LM Studio | Estimated | 254.8 GB | 8bit is the highest practical quality here. 184.4 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 254.8 GB headroom leaves workable context margin. |
| 29 | Qwen 3 235B-A22B | 141 | Q6_K | 27.4 tok/s | LM Studio | Estimated | 73.0 GB | Q6_K is the highest practical quality here. 27.4 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 73.0 GB headroom leaves workable context margin. |
| 30 | Qwen3.5-397B-A17B | 128 | Q4_K_M | 40.2 tok/s | Best available | Estimated | 42.9 GB | Q4_K_M is the highest practical quality here. 40.2 tok/s estimated from nearby benchmark coverage. 42.9 GB headroom leaves workable context margin. |
| 31 | Llama 3.2 1B | 124 | 8bit | Measure it | Best available | Fit-first | 254.1 GB | 8bit is the highest practical quality here. Speed still needs direct measurement. 254.1 GB headroom leaves workable context margin. |
Machine
256GBUnified memory
$7,499MSRP
mac_studioForm factor
m3-ultraChip