Best models for this Mac

31 modelsCatalog current through February 27, 2026

Mac Studio M3 Ultra 256GB ranked for coding with a most capable bias, using the best available runtime evidence.

Mac

Capability

Bias

Runtime

Use the strongest current runtime evidence for each row.M5 Max watch: 8B 61.6 tok/s · 14B 34.3 tok/s on the current 36GB public anchor.

Rank	Model	Score	Quant	Tok/s	Runtime	Evidence	Headroom	Why it ranks here
1	Qwen 3 32B	274	8bit	22.0 tok/s	Ollama	Estimated	223.0 GB	8bit is the highest practical quality here. 22.0 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 223.0 GB headroom leaves workable context margin.
2	Qwen3.5-27B	266	8bit	38.0 tok/s	MLX	Estimated	228.4 GB	8bit is the highest practical quality here. 38.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 228.4 GB headroom leaves workable context margin.
3	Devstral Small 2 24B	262	8bit	47.0 tok/s	MLX	Estimated	231.9 GB	8bit is the highest practical quality here. 47.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 231.9 GB headroom leaves workable context margin.
4	Devstral Small 1.1	262	8bit	43.0 tok/s	LM Studio	Estimated	231.9 GB	8bit is the highest practical quality here. 43.0 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 231.9 GB headroom leaves workable context margin.
5	Llama 3.3 70B	261	8bit	11.8 tok/s	LM Studio	Estimated	187.2 GB	8bit is the highest practical quality here. 11.8 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 187.2 GB headroom leaves workable context margin.
6	Gemma 3 27B	259	8bit	14.5 tok/s	LM Studio	Estimated	226.1 GB	8bit is the highest practical quality here. 14.5 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 226.1 GB headroom leaves workable context margin.
7	DeepSeek R1 Distill Qwen 32B	252	8bit	Measure it	Best available	Fit-first	223.0 GB	8bit is the highest practical quality here. Speed still needs direct measurement. 223.0 GB headroom leaves workable context margin.
8	Mistral Small 3.1 24B	240	8bit	Measure it	Best available	Fit-first	231.9 GB	8bit is the highest practical quality here. Speed still needs direct measurement. 231.9 GB headroom leaves workable context margin.
9	Magistral Small	240	8bit	Measure it	Best available	Fit-first	231.9 GB	8bit is the highest practical quality here. Speed still needs direct measurement. 231.9 GB headroom leaves workable context margin.
10	Nemotron-3-Nano-30B-A3B	233	8bit	43.7 tok/s	llama.cpp	Estimated	227.2 GB	8bit is the highest practical quality here. 43.7 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 227.2 GB headroom leaves workable context margin.
11	Qwen 3 30B-A3B	233	8bit	76.7 tok/s	MLX	Estimated	226.3 GB	8bit is the highest practical quality here. 76.7 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 226.3 GB headroom leaves workable context margin.
12	Qwen3-Coder-30B-A3B	233	8bit	58.5 tok/s	llama.cpp	Estimated	226.7 GB	8bit is the highest practical quality here. 58.5 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 226.7 GB headroom leaves workable context margin.
13	Qwen3.5-35B-A3B	232	8bit	80.0 tok/s	MLX	Measured	222.3 GB	8bit is the highest practical quality here. 80.0 tok/s measured on MLX backend. 222.3 GB headroom leaves workable context margin.
14	GLM-4.7-Flash	232	8bit	36.8 tok/s	llama.cpp	Estimated	220.2 GB	8bit is the highest practical quality here. 36.8 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 220.2 GB headroom leaves workable context margin.
15	Qwen3.5-9B	213	8bit	106.0 tok/s	MLX	Estimated	246.1 GB	8bit is the highest practical quality here. 106.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 246.1 GB headroom leaves workable context margin.
16	Qwen 3 8B	211	8bit	63.1 tok/s	LM Studio	Estimated	246.7 GB	8bit is the highest practical quality here. 63.1 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 246.7 GB headroom leaves workable context margin.
17	Llama 2 7B	209	8bit	36.4 tok/s	llama.cpp	Estimated	245.2 GB	8bit is the highest practical quality here. 36.4 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 245.2 GB headroom leaves workable context margin.
18	Qwen 2.5 14B	199	8bit	Measure it	Best available	Fit-first	241.0 GB	8bit is the highest practical quality here. Speed still needs direct measurement. 241.0 GB headroom leaves workable context margin.
19	Phi-4 14B	198	8bit	Measure it	Best available	Fit-first	241.5 GB	8bit is the highest practical quality here. Speed still needs direct measurement. 241.5 GB headroom leaves workable context margin.
20	Llama 3.1 8B	189	8bit	Measure it	Best available	Fit-first	247.0 GB	8bit is the highest practical quality here. Speed still needs direct measurement. 247.0 GB headroom leaves workable context margin.
21	GLM-4.5-Air	189	8bit	54.0 tok/s	MLX	Estimated	155.3 GB	8bit is the highest practical quality here. 54.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 155.3 GB headroom leaves workable context margin.
22	Qwen 2.5 7B	189	8bit	Measure it	Best available	Fit-first	248.0 GB	8bit is the highest practical quality here. Speed still needs direct measurement. 248.0 GB headroom leaves workable context margin.
23	Mistral 7B v0.3	188	8bit	Measure it	Best available	Fit-first	247.7 GB	8bit is the highest practical quality here. Speed still needs direct measurement. 247.7 GB headroom leaves workable context margin.
24	Qwen3.5-122B-A10B	186	8bit	43.0 tok/s	MLX	Measured	141.1 GB	8bit is the highest practical quality here. 43.0 tok/s measured on MLX backend. 141.1 GB headroom leaves workable context margin.
25	Qwen3-Coder-Next	176	8bit	74.0 tok/s	MLX	Estimated	180.2 GB	8bit is the highest practical quality here. 74.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 180.2 GB headroom leaves workable context margin.
26	Gemma 3 4B	150	8bit	100.5 tok/s	LM Studio	Estimated	250.4 GB	8bit is the highest practical quality here. 100.5 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 250.4 GB headroom leaves workable context margin.
27	Qwen 3 4B	150	8bit	143.2 tok/s	MLX	Estimated	250.6 GB	8bit is the highest practical quality here. 143.2 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 250.6 GB headroom leaves workable context margin.
28	Qwen 3 0.6B	145	8bit	184.4 tok/s	LM Studio	Estimated	254.8 GB	8bit is the highest practical quality here. 184.4 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 254.8 GB headroom leaves workable context margin.
29	Qwen 3 235B-A22B	141	Q6_K	27.4 tok/s	LM Studio	Estimated	73.0 GB	Q6_K is the highest practical quality here. 27.4 tok/s estimated from nearby benchmark coverage, with LM Studio wrapper on mixed as the best runtime hint. 73.0 GB headroom leaves workable context margin.
30	Qwen3.5-397B-A17B	128	Q4_K_M	40.2 tok/s	Best available	Estimated	42.9 GB	Q4_K_M is the highest practical quality here. 40.2 tok/s estimated from nearby benchmark coverage. 42.9 GB headroom leaves workable context margin.
31	Llama 3.2 1B	124	8bit	Measure it	Best available	Fit-first	254.1 GB	8bit is the highest practical quality here. Speed still needs direct measurement. 254.1 GB headroom leaves workable context margin.

Machine

256GBUnified memory

$7,499MSRP

mac_studioForm factor

m3-ultraChip