Best models for this Mac

27 modelsCatalog current through February 27, 2026

MacBook Pro M4 Max 48GB 16-inch ranked for coding with a most capable bias, using the best available runtime evidence.

Use the strongest current runtime evidence for each row.M5 Max watch: 8B 61.6 tok/s · 14B 34.3 tok/s on the current 36GB public anchor.
RankModelScoreQuantTok/sRuntimeEvidenceHeadroomWhy it ranks here
1Qwen 3 32B2658bit22.0 tok/sOllamaEstimated15.0 GB8bit is the highest practical quality here. 22.0 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 15.0 GB headroom leaves workable context margin.
2Devstral Small 1.12618bit43.0 tok/sLM StudioEstimated23.9 GB8bit is the highest practical quality here. 43.0 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 23.9 GB headroom leaves workable context margin.
3Devstral Small 2 24B2618bit25.3 tok/sMLXEstimated23.9 GB8bit is the highest practical quality here. 25.3 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 23.9 GB headroom leaves workable context margin.
4Qwen3.5-27B2618bit20.6 tok/sMLXEstimated20.4 GB8bit is the highest practical quality here. 20.6 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 20.4 GB headroom leaves workable context margin.
5Gemma 3 27B2538bit14.5 tok/sLM StudioEstimated18.1 GB8bit is the highest practical quality here. 14.5 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 18.1 GB headroom leaves workable context margin.
6DeepSeek R1 Distill Qwen 32B2438bitMeasure itBest availableFit-first15.0 GB8bit is the highest practical quality here. Speed still needs direct measurement. 15.0 GB headroom leaves workable context margin.
7Mistral Small 3.1 24B2398bitMeasure itBest availableFit-first23.9 GB8bit is the highest practical quality here. Speed still needs direct measurement. 23.9 GB headroom leaves workable context margin.
8Magistral Small2398bitMeasure itBest availableFit-first23.9 GB8bit is the highest practical quality here. Speed still needs direct measurement. 23.9 GB headroom leaves workable context margin.
9Llama 3.3 70B232Q4_K_M11.8 tok/sLM StudioEstimated7.4 GBQ4_K_M is the highest practical quality here. 11.8 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 7.4 GB headroom leaves workable context margin.
10Nemotron-3-Nano-30B-A3B2288bit43.7 tok/sllama.cppEstimated19.2 GB8bit is the highest practical quality here. 43.7 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 19.2 GB headroom leaves workable context margin.
11Qwen3-Coder-30B-A3B2278bit58.5 tok/sllama.cppEstimated18.7 GB8bit is the highest practical quality here. 58.5 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 18.7 GB headroom leaves workable context margin.
12Qwen 3 30B-A3B2278bit76.7 tok/sMLXEstimated18.3 GB8bit is the highest practical quality here. 76.7 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 18.3 GB headroom leaves workable context margin.
13Qwen3.5-35B-A3B2228bit80.0 tok/sMLXEstimated14.3 GB8bit is the highest practical quality here. 80.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 14.3 GB headroom leaves workable context margin.
14GLM-4.7-Flash2208bit36.8 tok/sllama.cppEstimated12.2 GB8bit is the highest practical quality here. 36.8 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 12.2 GB headroom leaves workable context margin.
15Qwen 3 8B2118bit63.1 tok/sLM StudioEstimated38.7 GB8bit is the highest practical quality here. 63.1 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 38.7 GB headroom leaves workable context margin.
16Llama 2 7B2098bit36.4 tok/sllama.cppEstimated37.2 GB8bit is the highest practical quality here. 36.4 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 37.2 GB headroom leaves workable context margin.
17Qwen3.5-9B2068bit15.0 tok/sllama.cppEstimated38.1 GB8bit is the highest practical quality here. 15.0 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 38.1 GB headroom leaves workable context margin.
18Qwen 2.5 14B1998bitMeasure itBest availableFit-first33.0 GB8bit is the highest practical quality here. Speed still needs direct measurement. 33.0 GB headroom leaves workable context margin.
19Phi-4 14B1988bitMeasure itBest availableFit-first33.5 GB8bit is the highest practical quality here. Speed still needs direct measurement. 33.5 GB headroom leaves workable context margin.
20Llama 3.1 8B1898bitMeasure itBest availableFit-first39.0 GB8bit is the highest practical quality here. Speed still needs direct measurement. 39.0 GB headroom leaves workable context margin.
21Qwen 2.5 7B1898bitMeasure itBest availableFit-first40.0 GB8bit is the highest practical quality here. Speed still needs direct measurement. 40.0 GB headroom leaves workable context margin.
22Mistral 7B v0.31888bitMeasure itBest availableFit-first39.7 GB8bit is the highest practical quality here. Speed still needs direct measurement. 39.7 GB headroom leaves workable context margin.
23Gemma 3 4B1508bit100.5 tok/sLM StudioEstimated42.4 GB8bit is the highest practical quality here. 100.5 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 42.4 GB headroom leaves workable context margin.
24Qwen 3 4B1508bit143.2 tok/sMLXEstimated42.6 GB8bit is the highest practical quality here. 143.2 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 42.6 GB headroom leaves workable context margin.
25Qwen3-Coder-Next149q4.1bit74.0 tok/sMLXEstimated8.6 GBq4.1bit is the highest practical quality here. 74.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 8.6 GB headroom leaves workable context margin.
26Qwen 3 0.6B1458bit184.4 tok/sLM StudioEstimated46.8 GB8bit is the highest practical quality here. 184.4 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 46.8 GB headroom leaves workable context margin.
27Llama 3.2 1B1248bitMeasure itBest availableFit-first46.1 GB8bit is the highest practical quality here. Speed still needs direct measurement. 46.1 GB headroom leaves workable context margin.
48GBUnified memory
$3,999MSRP
macbook_proForm factor
m4-maxChip