Best models for this Mac

31 modelsCatalog current through February 27, 2026

Mac Studio M3 Ultra 256GB ranked for coding with a most capable bias, using the best available runtime evidence.

Use the strongest current runtime evidence for each row.M5 Max watch: 8B 61.6 tok/s · 14B 34.3 tok/s on the current 36GB public anchor.
RankModelScoreQuantTok/sRuntimeEvidenceHeadroomWhy it ranks here
1Qwen 3 32B2748bit22.0 tok/sOllamaEstimated223.0 GB8bit is the highest practical quality here. 22.0 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 223.0 GB headroom leaves workable context margin.
2Qwen3.5-27B2668bit38.0 tok/sMLXEstimated228.4 GB8bit is the highest practical quality here. 38.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 228.4 GB headroom leaves workable context margin.
3Devstral Small 2 24B2628bit47.0 tok/sMLXEstimated231.9 GB8bit is the highest practical quality here. 47.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 231.9 GB headroom leaves workable context margin.
4Devstral Small 1.12628bit43.0 tok/sLM StudioEstimated231.9 GB8bit is the highest practical quality here. 43.0 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 231.9 GB headroom leaves workable context margin.
5Llama 3.3 70B2618bit11.8 tok/sLM StudioEstimated187.2 GB8bit is the highest practical quality here. 11.8 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 187.2 GB headroom leaves workable context margin.
6Gemma 3 27B2598bit14.5 tok/sLM StudioEstimated226.1 GB8bit is the highest practical quality here. 14.5 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 226.1 GB headroom leaves workable context margin.
7DeepSeek R1 Distill Qwen 32B2528bitMeasure itBest availableFit-first223.0 GB8bit is the highest practical quality here. Speed still needs direct measurement. 223.0 GB headroom leaves workable context margin.
8Mistral Small 3.1 24B2408bitMeasure itBest availableFit-first231.9 GB8bit is the highest practical quality here. Speed still needs direct measurement. 231.9 GB headroom leaves workable context margin.
9Magistral Small2408bitMeasure itBest availableFit-first231.9 GB8bit is the highest practical quality here. Speed still needs direct measurement. 231.9 GB headroom leaves workable context margin.
10Nemotron-3-Nano-30B-A3B2338bit43.7 tok/sllama.cppEstimated227.2 GB8bit is the highest practical quality here. 43.7 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 227.2 GB headroom leaves workable context margin.
11Qwen 3 30B-A3B2338bit76.7 tok/sMLXEstimated226.3 GB8bit is the highest practical quality here. 76.7 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 226.3 GB headroom leaves workable context margin.
12Qwen3-Coder-30B-A3B2338bit58.5 tok/sllama.cppEstimated226.7 GB8bit is the highest practical quality here. 58.5 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 226.7 GB headroom leaves workable context margin.
13Qwen3.5-35B-A3B2328bit80.0 tok/sMLXMeasured222.3 GB8bit is the highest practical quality here. 80.0 tok/s measured on MLX backend. 222.3 GB headroom leaves workable context margin.
14GLM-4.7-Flash2328bit36.8 tok/sllama.cppEstimated220.2 GB8bit is the highest practical quality here. 36.8 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 220.2 GB headroom leaves workable context margin.
15Qwen3.5-9B2138bit106.0 tok/sMLXEstimated246.1 GB8bit is the highest practical quality here. 106.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 246.1 GB headroom leaves workable context margin.
16Qwen 3 8B2118bit63.1 tok/sLM StudioEstimated246.7 GB8bit is the highest practical quality here. 63.1 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 246.7 GB headroom leaves workable context margin.
17Llama 2 7B2098bit36.4 tok/sllama.cppEstimated245.2 GB8bit is the highest practical quality here. 36.4 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 245.2 GB headroom leaves workable context margin.
18Qwen 2.5 14B1998bitMeasure itBest availableFit-first241.0 GB8bit is the highest practical quality here. Speed still needs direct measurement. 241.0 GB headroom leaves workable context margin.
19Phi-4 14B1988bitMeasure itBest availableFit-first241.5 GB8bit is the highest practical quality here. Speed still needs direct measurement. 241.5 GB headroom leaves workable context margin.
20Llama 3.1 8B1898bitMeasure itBest availableFit-first247.0 GB8bit is the highest practical quality here. Speed still needs direct measurement. 247.0 GB headroom leaves workable context margin.
21GLM-4.5-Air1898bit54.0 tok/sMLXEstimated155.3 GB8bit is the highest practical quality here. 54.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 155.3 GB headroom leaves workable context margin.
22Qwen 2.5 7B1898bitMeasure itBest availableFit-first248.0 GB8bit is the highest practical quality here. Speed still needs direct measurement. 248.0 GB headroom leaves workable context margin.
23Mistral 7B v0.31888bitMeasure itBest availableFit-first247.7 GB8bit is the highest practical quality here. Speed still needs direct measurement. 247.7 GB headroom leaves workable context margin.
24Qwen3.5-122B-A10B1868bit43.0 tok/sMLXMeasured141.1 GB8bit is the highest practical quality here. 43.0 tok/s measured on MLX backend. 141.1 GB headroom leaves workable context margin.
25Qwen3-Coder-Next1768bit74.0 tok/sMLXEstimated180.2 GB8bit is the highest practical quality here. 74.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 180.2 GB headroom leaves workable context margin.
26Gemma 3 4B1508bit100.5 tok/sLM StudioEstimated250.4 GB8bit is the highest practical quality here. 100.5 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 250.4 GB headroom leaves workable context margin.
27Qwen 3 4B1508bit143.2 tok/sMLXEstimated250.6 GB8bit is the highest practical quality here. 143.2 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 250.6 GB headroom leaves workable context margin.
28Qwen 3 0.6B1458bit184.4 tok/sLM StudioEstimated254.8 GB8bit is the highest practical quality here. 184.4 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 254.8 GB headroom leaves workable context margin.
29Qwen 3 235B-A22B141Q6_K27.4 tok/sLM StudioEstimated73.0 GBQ6_K is the highest practical quality here. 27.4 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 73.0 GB headroom leaves workable context margin.
30Qwen3.5-397B-A17B128Q4_K_M40.2 tok/sBest availableEstimated42.9 GBQ4_K_M is the highest practical quality here. 40.2 tok/s estimated from nearby benchmark coverage. 42.9 GB headroom leaves workable context margin.
31Llama 3.2 1B1248bitMeasure itBest availableFit-first254.1 GB8bit is the highest practical quality here. Speed still needs direct measurement. 254.1 GB headroom leaves workable context margin.
256GBUnified memory
$7,499MSRP
mac_studioForm factor
m3-ultraChip