Canonical Rankings

Best Macs for this model

Qwen3.5-397B-A17B ranked across the Mac lineup at the best practical quantization, using the best available runtime evidence. Model picker is focused on current-market choices.

Model

Quantization

Sort

Runtime

29 ranked MacsUse the strongest current runtime evidence for each row.28 historical models hiddenBaselinesStatic paths cover only canonical model pages; sort and quantization stay as query state.

Rank	Mac	Score	Quant	Tok/s	Runtime	Fits	Headroom	Context	Evidence	Price	Why it ranks here
1	Mac Studio M3 Ultra 256GB	258	Q4_K_M	40.2 tok/s Fastest evidence path: Q4_K_M · 40.2 tok/s · Best available · Estimated	Best available	Fits	42.9 GB	47k	Estimated	$7,499	Q4_K_M is the current best practical quantization. 40.2 tok/s is estimated from nearby benchmark coverage. 42.9 GB headroom remains at this quantization.
2	Mac Pro M2 Ultra 192GB	237	3bit	40.2 tok/s Fastest evidence path: 3bit · 40.2 tok/s · Best available · Estimated	Best available	Fits	51.9 GB	210k	Estimated	$6,999	3bit is the current best practical quantization. 40.2 tok/s is estimated from nearby benchmark coverage. 51.9 GB headroom remains at this quantization.
3	Mac Studio M4 Max 128GB	214	IQ2_K_S	40.2 tok/s Fastest evidence path: IQ2_K_S · 40.2 tok/s · Best available · Estimated	Best available	Fits	29.5 GB	98k	Estimated	$4,499	IQ2_K_S is the current best practical quantization. 40.2 tok/s is estimated from nearby benchmark coverage. 29.5 GB headroom remains at this quantization.
4	MacBook Pro M4 Max 128GB 16-inch	214	IQ2_K_S	40.2 tok/s Fastest evidence path: IQ2_K_S · 40.2 tok/s · Best available · Estimated	Best available	Fits	29.5 GB	98k	Estimated	$5,999	IQ2_K_S is the current best practical quantization. 40.2 tok/s is estimated from nearby benchmark coverage. 29.5 GB headroom remains at this quantization.
5	MacBook Pro M5 Max 128GB 16-inch	105	IQ2_K_S	13.0 tok/s Fastest evidence path: IQ2_K_S · 13.0 tok/s · Best available · Estimated	Best available	Fits	29.5 GB	98k	Estimated	$5,399	IQ2_K_S is the current best practical quantization. 13.0 tok/s is estimated from nearby benchmark coverage. 29.5 GB headroom remains at this quantization.
6	Mac Mini M4 16GB	0	F32	—	Best available	No	-1464.4 GB	—	Estimated	$499	Qwen3.5-397B-A17B does not fit on Mac Mini M4 16GB at the current practical quantization.
7	Mac Mini M4 24GB	0	F32	—	Best available	No	-1456.4 GB	—	Estimated	$599	Qwen3.5-397B-A17B does not fit on Mac Mini M4 24GB at the current practical quantization.
8	Mac Mini M4 32GB	0	F32	—	Best available	No	-1448.4 GB	—	Estimated	$799	Qwen3.5-397B-A17B does not fit on Mac Mini M4 32GB at the current practical quantization.
9	MacBook Air M4 16GB 13-inch	0	F32	—	Best available	No	-1464.4 GB	—	Estimated	$1,099	Qwen3.5-397B-A17B does not fit on MacBook Air M4 16GB 13-inch at the current practical quantization.
10	MacBook Air M4 24GB 13-inch	0	F32	—	Best available	No	-1456.4 GB	—	Estimated	$1,299	Qwen3.5-397B-A17B does not fit on MacBook Air M4 24GB 13-inch at the current practical quantization.
11	MacBook Air M4 16GB 15-inch	0	F32	—	Best available	No	-1464.4 GB	—	Estimated	$1,299	Qwen3.5-397B-A17B does not fit on MacBook Air M4 16GB 15-inch at the current practical quantization.
12	Mac Mini M4 Pro 24GB	0	F32	—	Best available	No	-1456.4 GB	—	Estimated	$1,399	Qwen3.5-397B-A17B does not fit on Mac Mini M4 Pro 24GB at the current practical quantization.
13	MacBook Air M4 32GB 13-inch	0	F32	—	Best available	No	-1448.4 GB	—	Estimated	$1,499	Qwen3.5-397B-A17B does not fit on MacBook Air M4 32GB 13-inch at the current practical quantization.
14	MacBook Air M4 24GB 15-inch	0	F32	—	Best available	No	-1456.4 GB	—	Estimated	$1,499	Qwen3.5-397B-A17B does not fit on MacBook Air M4 24GB 15-inch at the current practical quantization.
15	Mac Mini M4 Pro 48GB	0	F32	—	Best available	No	-1432.4 GB	—	Estimated	$1,599	Qwen3.5-397B-A17B does not fit on Mac Mini M4 Pro 48GB at the current practical quantization.
16	MacBook Air M4 32GB 15-inch	0	F32	—	Best available	No	-1448.4 GB	—	Estimated	$1,699	Qwen3.5-397B-A17B does not fit on MacBook Air M4 32GB 15-inch at the current practical quantization.
17	MacBook Pro M4 Pro 24GB 14-inch	0	F32	—	Best available	No	-1456.4 GB	—	Estimated	$1,999	Qwen3.5-397B-A17B does not fit on MacBook Pro M4 Pro 24GB 14-inch at the current practical quantization.
18	Mac Studio M4 Max 36GB	0	F32	—	Best available	No	-1444.4 GB	—	Estimated	$1,999	Qwen3.5-397B-A17B does not fit on Mac Studio M4 Max 36GB at the current practical quantization.
19	MacBook Pro M4 Pro 48GB 14-inch	0	F32	—	Best available	No	-1432.4 GB	—	Estimated	$2,499	Qwen3.5-397B-A17B does not fit on MacBook Pro M4 Pro 48GB 14-inch at the current practical quantization.
20	MacBook Pro M4 Pro 24GB 16-inch	0	F32	—	Best available	No	-1456.4 GB	—	Estimated	$2,499	Qwen3.5-397B-A17B does not fit on MacBook Pro M4 Pro 24GB 16-inch at the current practical quantization.
21	Mac Studio M4 Max 48GB	0	F32	—	Best available	No	-1432.4 GB	—	Estimated	$2,499	Qwen3.5-397B-A17B does not fit on Mac Studio M4 Max 48GB at the current practical quantization.
22	MacBook Pro M4 Max 36GB 14-inch	0	F32	—	Best available	No	-1444.4 GB	—	Estimated	$2,999	Qwen3.5-397B-A17B does not fit on MacBook Pro M4 Max 36GB 14-inch at the current practical quantization.
23	MacBook Pro M4 Pro 48GB 16-inch	0	F32	—	Best available	No	-1432.4 GB	—	Estimated	$2,999	Qwen3.5-397B-A17B does not fit on MacBook Pro M4 Pro 48GB 16-inch at the current practical quantization.
24	Mac Studio M4 Max 64GB	0	F32	—	Best available	No	-1416.4 GB	—	Estimated	$2,999	Qwen3.5-397B-A17B does not fit on Mac Studio M4 Max 64GB at the current practical quantization.
25	MacBook Pro M4 Max 48GB 14-inch	0	F32	—	Best available	No	-1432.4 GB	—	Estimated	$3,499	Qwen3.5-397B-A17B does not fit on MacBook Pro M4 Max 48GB 14-inch at the current practical quantization.
26	MacBook Pro M4 Max 36GB 16-inch	0	F32	—	Best available	No	-1444.4 GB	—	Estimated	$3,499	Qwen3.5-397B-A17B does not fit on MacBook Pro M4 Max 36GB 16-inch at the current practical quantization.
27	MacBook Pro M4 Max 48GB 16-inch	0	F32	—	Best available	No	-1432.4 GB	—	Estimated	$3,999	Qwen3.5-397B-A17B does not fit on MacBook Pro M4 Max 48GB 16-inch at the current practical quantization.
28	Mac Studio M3 Ultra 96GB	0	F32	—	Best available	No	-1384.4 GB	—	Estimated	$3,999	Qwen3.5-397B-A17B does not fit on Mac Studio M3 Ultra 96GB at the current practical quantization.
29	MacBook Pro M4 Max 64GB 16-inch	0	F32	—	Best available	No	-1416.4 GB	—	Estimated	$4,499	Qwen3.5-397B-A17B does not fit on MacBook Pro M4 Max 64GB 16-inch at the current practical quantization.

Qwen3.5-397B-A17B — ranking first, raw rows below

Start with the ranked Mac table above. Use the rest of this page to inspect raw Apple Silicon coverage and model metadata.

Quantizations observed: q4.1bit, 4bit

2Benchmark rows

2Chip tiers covered

40.2Fastest avg tok/s (M3 Ultra (512 GB))

—Minimum RAM observed

Quick take

Fastest published result is 40.2 tok/s on M3 Ultra (512 GB) at q4.1bit. Published runtimes include flash-moe, MLX. Start with Rankings for the decision, then use the raw rows below to audit the evidence.

Based on 2 external benchmarks; no lab runs yet.

Published runtimes: flash-moe, MLX.

Need the best Mac for this model? Use Buy Need a setup-first answer? Use Run Checking whether it fits? Use Fit Browse Macs by exact hardware Need the full audit trail? Use Bench Comparing against rented GPUs? Use AI Datacenter Index

Catalog record

397BTotal params

17BActive params

262,144Context window

2026-02-16Release date

What this model is, and what Apple Silicon users are actually seeing

Official model cards tell you what the model is for and which software stacks it targets. Field reality below shows how much Apple Silicon evidence we have so far.

Official brief

Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.

Official source · Raw model card

agentscodingreasoningvisual-understanding

Runtime support mentioned

vLLMSGLangTransformersKTransformers

Official specs

Type: Causal Language Model with Vision Encoder.
Scale: 397B in total and 17B activated.
Context: 262,144 natively and extensible up to 1,010,000 tokens.
Total parameters: 397B in total and 17B activated.
Max input: 262,144 natively and extensible up to 1,010,000 tokens.

Official takeaways

Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding ben…
Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead.
Scalable RL Generalization: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability.
Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.

Official model cards describe intent, capabilities, and supported stacks. They do not prove Apple Silicon speed by themselves.

Field reality on Apple Silicon

Qwen3.5-397B-A17B: 4 Apple Silicon field reports; best reported generation ~30.81 tok/s; best reported prompt processing ~122.46 tok/s; seen on MacBook Pro M5 MAX 128GB; via llama.cpp, flash-moe, MLX.

2Benchmark rows

4Field reports

5Practitioner signals

Sparse BenchmarksEvidence status

What practitioners keep saying

The post identifies Qwen3.5-397B-A17B-UD-IQ2_XXS on a MacBook Pro M5 Max 128GB, a roughly 106GB GGUF footprint, llama.cpp serving with --ctx-size 16384, and iogpu.wired_limit_mb=122880 to make the 16K context run fit.
The posted llama.cpp sample reports 122.46 tok/s prompt processing for 33 prompt tokens and 30.81 tok/s generation for 2458 generated tokens.
The author says prompt processing varies with batching, so that range should stay methodology context rather than a separate exact field row.

Apple Silicon field sources

r/LocalLLaMA
2026-04-13 · MacBook Pro M5 Max 128GB · llama.cpp
A cupel follow-up reports Qwen3.5-397B-A17B-UD-IQ2_XXS running through llama.cpp on an M5 Max 128GB MacBook Pro with a measured decode sample and important memory and sustained-load caveats.
r/LocalLLaMA
2026-03-26 · Mac Studio M3 Ultra 512GB · MLX
A Mac Studio M3 Ultra 512GB owner reports Qwen3.5-397B-A17B running locally on MLX 6-bit, which pushes the Apple Silicon stretch frontier far past speculative fit.
r/LocalLLaMA
2026-03-26 · MacBook Pro M5 Max 128GB · flash-moe
A follow-up M5 Max 128GB flash-moe benchmark turns Qwen3.5-397B-A17B from a vague stretch-tier experiment into a measured laptop result, with the best 4-bit run reaching 12.99 tok/s locally.
r/LocalLLaMA
2026-03-22 · Mac Studio M3 Ultra 512GB · MLX
Operators report that even when Qwen3.5-397B-A17B technically fits on an M3 Ultra 512GB Mac Studio, practical coding use is still uncomfortable enough that it remains a stretch-tier watch model.
r/LocalLLaMA
2026-03-21 · MacBook Pro M5 Max 128GB · flash-moe
Practitioners report that Qwen3.5-397B-A17B is at least experimentally runnable on M5 Max 128GB Apple Silicon with single-digit generation speed.

Runtime mentions in the field

llama.cppMLX

Hardware mentioned in reports

128GBM3 UltraMacMac StudioMacBookMacBook Pro

What would improve confidence

Expand Cross Chip Benchmark Coverage
Reproduce Field Performance Signal
Upgrade To First Party Measurement

Current published coverage

Published chip coverage includes M3 Ultra (512 GB), M5 Max (128 GB). Fastest published row is 40.2 tok/s on M3 Ultra (512 GB) at q4.1bit.

M3 Ultra (512 GB)M5 Max (128 GB)

Related Qwen3.5-397B-A17B models with published pages: Qwen3.5-27B · Qwen3.5-35B-A3B · Qwen3.5-9B · Qwen3.5-122B-A10B · Qwen3.5-4B

Raw benchmark rows for Qwen3.5-397B-A17B

Rows stay below the ranking because this page is answer-first. Use them to inspect exact chips, quantizations, runtimes, and sources.

Chip	Quant	RAM req.	Context	Avg tok/s	Prompt tok/s	Runtime	Source
M3 Ultra (512 GB)	q4.1bit	—	—	40.2 tok/s	—	MLX	ref
M5 Max (128 GB)	4bit	—	—	13.0 tok/s	—	flash-moe	ref

Best Macs for Qwen3.5-397B-A17B

Ordered by fastest published tok/s on the chip family in each Mac. Click through for the full machine page.

Mac Studio M3 Ultra 96GB — 40.2 tok/s Mac Studio M3 Ultra 256GB — 40.2 tok/s MacBook Pro M5 Max 128GB 16-inch — 13.0 tok/s

Chips with published results for Qwen3.5-397B-A17B

M3 Ultra (512 GB)M5 Max (128 GB)

Data

benchmarks.json — full dataset · models.json — model summaries · benchmarks.csv — CSV export

See all models →