Best models for this Mac

24 modelsCatalog current through February 27, 2026

MacBook Pro M4 Pro 24GB 16-inch ranked for coding with a most capable bias, using the best available runtime evidence.

Mac

Capability

Bias

Runtime

Use the strongest current runtime evidence for each row.M5 Max watch: 8B 61.6 tok/s · 14B 34.3 tok/s on the current 36GB public anchor.

Rank	Model	Score	Quant	Tok/s	Runtime	Evidence	Headroom	Why it ranks here
1	Qwen 3 32B	242	Q4_K_M	22.0 tok/s	Ollama	Estimated	4.0 GB	Q4_K_M is the highest practical quality here. 22.0 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 4.0 GB headroom leaves workable context margin.
2	Qwen3.5-27B	238	Q5_K_M	20.6 tok/s	MLX	Estimated	3.6 GB	Q5_K_M is the highest practical quality here. 20.6 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 3.6 GB headroom is tight.
3	Devstral Small 1.1	236	Q6_K	43.0 tok/s	LM Studio	Estimated	3.9 GB	Q6_K is the highest practical quality here. 43.0 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 3.9 GB headroom is tight.
4	Devstral Small 2 24B	236	Q6_K	25.3 tok/s	MLX	Estimated	3.9 GB	Q6_K is the highest practical quality here. 25.3 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 3.9 GB headroom is tight.
5	Gemma 3 27B	227	Q5	14.5 tok/s	LM Studio	Estimated	3.7 GB	Q5 is the highest practical quality here. 14.5 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 3.7 GB headroom is tight.
6	DeepSeek R1 Distill Qwen 32B	220	Q4_K_M	Measure it	Best available	Fit-first	4.0 GB	Q4_K_M is the highest practical quality here. Speed still needs direct measurement. 4.0 GB headroom leaves workable context margin.
7	Mistral Small 3.1 24B	214	Q6_K	Measure it	Best available	Fit-first	3.9 GB	Q6_K is the highest practical quality here. Speed still needs direct measurement. 3.9 GB headroom is tight.
8	Magistral Small	214	Q6_K	Measure it	Best available	Fit-first	3.9 GB	Q6_K is the highest practical quality here. Speed still needs direct measurement. 3.9 GB headroom is tight.
9	Nemotron-3-Nano-30B-A3B	203	Q5	43.7 tok/s	llama.cpp	Estimated	5.6 GB	Q5 is the highest practical quality here. 43.7 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 5.6 GB headroom leaves workable context margin.
10	Qwen 3 8B	202	8bit	63.1 tok/s	LM Studio	Estimated	14.7 GB	8bit is the highest practical quality here. 63.1 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 14.7 GB headroom leaves workable context margin.
11	Qwen3-Coder-30B-A3B	202	Q5	58.5 tok/s	llama.cpp	Estimated	5.4 GB	Q5 is the highest practical quality here. 58.5 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 5.4 GB headroom leaves workable context margin.
12	Qwen 3 30B-A3B	202	Q5	76.7 tok/s	MLX	Estimated	5.0 GB	Q5 is the highest practical quality here. 76.7 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 5.0 GB headroom leaves workable context margin.
13	Qwen3.5-35B-A3B	200	Q4_K_M	80.0 tok/s	MLX	Estimated	4.2 GB	Q4_K_M is the highest practical quality here. 80.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 4.2 GB headroom leaves workable context margin.
14	Llama 2 7B	199	8bit	36.4 tok/s	llama.cpp	Estimated	13.2 GB	8bit is the highest practical quality here. 36.4 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 13.2 GB headroom leaves workable context margin.
15	Qwen3.5-9B	196	8bit	15.0 tok/s	llama.cpp	Estimated	14.1 GB	8bit is the highest practical quality here. 15.0 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 14.1 GB headroom leaves workable context margin.
16	Qwen 2.5 14B	184	8bit	Measure it	Best available	Fit-first	9.0 GB	8bit is the highest practical quality here. Speed still needs direct measurement. 9.0 GB headroom leaves workable context margin.
17	Phi-4 14B	183	8bit	Measure it	Best available	Fit-first	9.5 GB	8bit is the highest practical quality here. Speed still needs direct measurement. 9.5 GB headroom leaves workable context margin.
18	Qwen 2.5 7B	181	8bit	Measure it	Best available	Fit-first	16.0 GB	8bit is the highest practical quality here. Speed still needs direct measurement. 16.0 GB headroom leaves workable context margin.
19	Llama 3.1 8B	180	8bit	Measure it	Best available	Fit-first	15.0 GB	8bit is the highest practical quality here. Speed still needs direct measurement. 15.0 GB headroom leaves workable context margin.
20	Mistral 7B v0.3	180	8bit	Measure it	Best available	Fit-first	15.7 GB	8bit is the highest practical quality here. Speed still needs direct measurement. 15.7 GB headroom leaves workable context margin.
21	Gemma 3 4B	144	8bit	100.5 tok/s	LM Studio	Estimated	18.4 GB	8bit is the highest practical quality here. 100.5 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 18.4 GB headroom leaves workable context margin.
22	Qwen 3 4B	144	8bit	143.2 tok/s	MLX	Estimated	18.6 GB	8bit is the highest practical quality here. 143.2 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 18.6 GB headroom leaves workable context margin.
23	Qwen 3 0.6B	144	8bit	184.4 tok/s	LM Studio	Estimated	22.8 GB	8bit is the highest practical quality here. 184.4 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 22.8 GB headroom leaves workable context margin.
24	Llama 3.2 1B	122	8bit	Measure it	Best available	Fit-first	22.1 GB	8bit is the highest practical quality here. Speed still needs direct measurement. 22.1 GB headroom leaves workable context margin.

Machine

24GBUnified memory

$2,499MSRP

macbook_proForm factor

m4-proChip