← Canonical rankings
Canonical Rankings

Best Macs for this model

Qwen3.5-35B-A3B ranked across the Mac lineup at the best practical quantization, using the best available runtime evidence. Model picker is focused on current-market choices.

29 ranked MacsUse the strongest current runtime evidence for each row.28 historical models hiddenStatic paths cover only canonical model pages; sort and quantization stay as query state.
RankMacScoreQuantTok/sRuntimeFitsHeadroomContextEvidencePriceWhy it ranks here
1Mac Studio M3 Ultra 256GB6168bit 80.0 tok/s Fastest evidence path: 3bit · 95.0 tok/s · MLX · EstimatedMLXFits222.3 GB262kCommunity row$7,4998bit is the current best practical quantization. 80.0 tok/s is backed by direct benchmark coverage. 222.3 GB headroom remains at this quantization.
2Mac Pro M2 Ultra 192GB4328bit 52.0 tok/s Fastest evidence path: 8bit · 52.0 tok/s · MLX · EstimatedMLXFits158.3 GB262kEstimated$6,9998bit is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 158.3 GB headroom remains at this quantization.
3Mac Studio M4 Max 128GB3688bit 52.0 tok/s Fastest evidence path: 8bit · 52.0 tok/s · MLX · EstimatedMLXFits94.3 GB262kEstimated$4,4998bit is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 94.3 GB headroom remains at this quantization.
4MacBook Pro M4 Max 128GB 16-inch3688bit 52.0 tok/s Fastest evidence path: 8bit · 52.0 tok/s · MLX · EstimatedMLXFits94.3 GB262kEstimated$5,9998bit is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 94.3 GB headroom remains at this quantization.
5MacBook Pro M5 Max 128GB 16-inch3528bit 48.0 tok/s Fastest evidence path: 8bit · 48.0 tok/s · Ollama · EstimatedOllamaFits94.3 GB262kEstimated$5,3998bit is the current best practical quantization. 48.0 tok/s is estimated from nearby benchmark coverage. 94.3 GB headroom remains at this quantization.
6Mac Studio M3 Ultra 96GB3368bit 52.0 tok/s Fastest evidence path: 8bit · 52.0 tok/s · MLX · EstimatedMLXFits62.3 GB262kEstimated$3,9998bit is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 62.3 GB headroom remains at this quantization.
7Mac Studio M4 Max 64GB3048bit 52.0 tok/s Fastest evidence path: 8bit · 52.0 tok/s · MLX · EstimatedMLXFits30.3 GB262kEstimated$2,9998bit is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 30.3 GB headroom remains at this quantization.
8MacBook Pro M4 Max 64GB 16-inch3048bit 52.0 tok/s Fastest evidence path: 8bit · 52.0 tok/s · MLX · EstimatedMLXFits30.3 GB262kEstimated$4,4998bit is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 30.3 GB headroom remains at this quantization.
9Mac Mini M4 Pro 48GB2888bit 52.0 tok/s Fastest evidence path: 8bit · 52.0 tok/s · MLX · EstimatedMLXFits14.3 GB101kEstimated$1,5998bit is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 14.3 GB headroom remains at this quantization.
10MacBook Pro M4 Pro 48GB 14-inch2888bit 52.0 tok/s Fastest evidence path: 8bit · 52.0 tok/s · MLX · EstimatedMLXFits14.3 GB101kEstimated$2,4998bit is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 14.3 GB headroom remains at this quantization.
11MacBook Pro M4 Pro 48GB 16-inch2888bit 52.0 tok/s Fastest evidence path: 8bit · 52.0 tok/s · MLX · EstimatedMLXFits14.3 GB101kEstimated$2,9998bit is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 14.3 GB headroom remains at this quantization.
12Mac Studio M4 Max 36GB276Q6_K 52.0 tok/s Fastest evidence path: Q6_K · 52.0 tok/s · MLX · EstimatedMLXFits8.1 GB44kEstimated$1,999Q6_K is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 8.1 GB headroom remains at this quantization.
13MacBook Pro M4 Max 36GB 14-inch276Q6_K 52.0 tok/s Fastest evidence path: Q6_K · 52.0 tok/s · MLX · EstimatedMLXFits8.1 GB44kEstimated$2,999Q6_K is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 8.1 GB headroom remains at this quantization.
14MacBook Pro M4 Max 36GB 16-inch276Q6_K 52.0 tok/s Fastest evidence path: Q6_K · 52.0 tok/s · MLX · EstimatedMLXFits8.1 GB44kEstimated$3,499Q6_K is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 8.1 GB headroom remains at this quantization.
15Mac Mini M4 32GB2746bit 52.0 tok/s Fastest evidence path: 6bit · 52.0 tok/s · MLX · EstimatedMLXFits6.4 GB30kEstimated$7996bit is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 6.4 GB headroom remains at this quantization.
16MacBook Air M4 32GB 13-inch2746bit 52.0 tok/s Fastest evidence path: 6bit · 52.0 tok/s · MLX · EstimatedMLXFits6.4 GB30kEstimated$1,4996bit is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 6.4 GB headroom remains at this quantization.
17MacBook Air M4 32GB 15-inch2746bit 52.0 tok/s Fastest evidence path: 6bit · 52.0 tok/s · MLX · EstimatedMLXFits6.4 GB30kEstimated$1,6996bit is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 6.4 GB headroom remains at this quantization.
18Mac Mini M4 24GB266Q4_K_M 52.0 tok/s Fastest evidence path: Q4_K_M · 52.0 tok/s · MLX · EstimatedMLXFits4.2 GB16kEstimated$599Q4_K_M is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 4.2 GB headroom remains at this quantization.
19MacBook Air M4 24GB 13-inch266Q4_K_M 52.0 tok/s Fastest evidence path: Q4_K_M · 52.0 tok/s · MLX · EstimatedMLXFits4.2 GB16kEstimated$1,299Q4_K_M is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 4.2 GB headroom remains at this quantization.
20Mac Mini M4 Pro 24GB266Q4_K_M 52.0 tok/s Fastest evidence path: Q4_K_M · 52.0 tok/s · MLX · EstimatedMLXFits4.2 GB16kEstimated$1,399Q4_K_M is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 4.2 GB headroom remains at this quantization.
21MacBook Air M4 24GB 15-inch266Q4_K_M 52.0 tok/s Fastest evidence path: Q4_K_M · 52.0 tok/s · MLX · EstimatedMLXFits4.2 GB16kEstimated$1,499Q4_K_M is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 4.2 GB headroom remains at this quantization.
22MacBook Pro M4 Pro 24GB 14-inch266Q4_K_M 52.0 tok/s Fastest evidence path: Q4_K_M · 52.0 tok/s · MLX · EstimatedMLXFits4.2 GB16kEstimated$1,999Q4_K_M is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 4.2 GB headroom remains at this quantization.
23MacBook Pro M4 Pro 24GB 16-inch266Q4_K_M 52.0 tok/s Fastest evidence path: Q4_K_M · 52.0 tok/s · MLX · EstimatedMLXFits4.2 GB16kEstimated$2,499Q4_K_M is the current best practical quantization. 52.0 tok/s is estimated from nearby benchmark coverage. 4.2 GB headroom remains at this quantization.
24Mac Studio M4 Max 48GB2168bit 34.0 tok/s Fastest evidence path: 8bit · 34.0 tok/s · Ollama · EstimatedOllamaFits14.3 GB101kEstimated$2,4998bit is the current best practical quantization. 34.0 tok/s is estimated from nearby benchmark coverage. 14.3 GB headroom remains at this quantization.
25MacBook Pro M4 Max 48GB 14-inch2168bit 34.0 tok/s Fastest evidence path: 8bit · 34.0 tok/s · Ollama · EstimatedOllamaFits14.3 GB101kEstimated$3,4998bit is the current best practical quantization. 34.0 tok/s is estimated from nearby benchmark coverage. 14.3 GB headroom remains at this quantization.
26MacBook Pro M4 Max 48GB 16-inch2168bit 34.0 tok/s Fastest evidence path: 8bit · 34.0 tok/s · Ollama · EstimatedOllamaFits14.3 GB101kEstimated$3,9998bit is the current best practical quantization. 34.0 tok/s is estimated from nearby benchmark coverage. 14.3 GB headroom remains at this quantization.
27Mac Mini M4 16GB323bit 1.3 tok/s Fastest evidence path: 3bit · 1.3 tok/s · llama.cpp · Estimatedllama.cppFits2.7 GB11kEstimated$4993bit is the current best practical quantization. 1.3 tok/s is estimated from nearby benchmark coverage. 2.7 GB headroom remains at this quantization.
28MacBook Air M4 16GB 13-inch323bit 1.3 tok/s Fastest evidence path: 3bit · 1.3 tok/s · llama.cpp · Estimatedllama.cppFits2.7 GB11kEstimated$1,0993bit is the current best practical quantization. 1.3 tok/s is estimated from nearby benchmark coverage. 2.7 GB headroom remains at this quantization.
29MacBook Air M4 16GB 15-inch323bit 1.3 tok/s Fastest evidence path: 3bit · 1.3 tok/s · llama.cpp · Estimatedllama.cppFits2.7 GB11kEstimated$1,2993bit is the current best practical quantization. 1.3 tok/s is estimated from nearby benchmark coverage. 2.7 GB headroom remains at this quantization.

Qwen3.5-35B-A3B — ranking first, raw rows below

Start with the ranked Mac table above. Use the rest of this page to inspect raw Apple Silicon coverage and model metadata.

Quantizations observed: 4bit, Q4_K - Medium, 8bit, Q4_K_L

13Benchmark rows
9Chip tiers covered
128.0Fastest avg tok/s (M5 Max (48 GB))
19.6 GBMinimum RAM observed

Fastest published result is 128.0 tok/s on M5 Max (48 GB) at 4bit. Smallest published fit is 19.6 GB on M3 Ultra (256 GB). Longest published context on this page is 8k. Published runtimes include llama.cpp, LM Studio (llama.cpp), LM Studio (MLX), MLX, Ollama. Start with Rankings for the decision, then use the raw rows below to audit the evidence.

Based on 13 external benchmarks; no lab runs yet.

Published runtimes: llama.cpp, LM Studio (llama.cpp), LM Studio (MLX), MLX, Ollama.

35BTotal params
3BActive params
262,144Context window
2026-02-24Release date

What this model is, and what Apple Silicon users are actually seeing

Official model cards tell you what the model is for and which software stacks it targets. Field reality below shows how much Apple Silicon evidence we have so far.

Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.

Official source  ·  Raw model card

agentscodingreasoningvisual-understanding

Runtime support mentioned

vLLMSGLangTransformersKTransformers

Official specs

  • Type: Causal Language Model with Vision Encoder.
  • Scale: 35B in total and 3B activated.
  • Context: 262,144 natively and extensible up to 1,010,000 tokens.
  • Total parameters: 35B in total and 3B activated.
  • Max input: 262,144 natively and extensible up to 1,010,000 tokens.

Official takeaways

  • Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding ben…
  • Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead.
  • Scalable RL Generalization: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability.
  • Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.

Official model cards describe intent, capabilities, and supported stacks. They do not prove Apple Silicon speed by themselves.

Qwen3.5-35B-A3B: 13 Apple Silicon field reports; best reported generation ~134.5 tok/s; seen on MacBook Pro M5 MAX 128GB, MacBook Pro M5 MAX 48GB, MacBook Pro M3 MAX 128GB; via oMLX, MLX, llama.cpp, Ollama.

13Benchmark rows
13Field reports
14Practitioner signals
Sparse BenchmarksEvidence status

What practitioners keep saying

  • The post reports Qwen3.5-35B-A3B on a MacBook Pro M3 Max 128GB at 80.3 tg tok/s with pp1024/tg128 under oMLX v0.2.23.
  • This older-Max row makes the M5 Max uplift auditable without mixing two machine contexts into one derived field claim.
  • The post reports Qwen3.5-35B-A3B on a MacBook Pro M5 Max 128GB at 134.5 tg tok/s with pp1024/tg128 under oMLX v0.2.23.

Apple Silicon field sources

  • r/LocalLLaMA

    2026-03-28 · MacBook Pro M3 Max 128GB, MacBook Pro M5 Max 128GB · oMLX

    The oMLX comparison reports Qwen3.5-35B-A3B at 80.3 tg tok/s on a MacBook Pro M3 Max 128GB at pp1024/tg128.

  • r/LocalLLaMA

    2026-03-26 · Mac Studio M1 Max 64GB, MacBook Pro M5 Max 48GB · llama.cpp, MLX

    The benchmark bundle reports Qwen3.5-35B-A3B 4-bit MLX on a Mac Studio M1 Max 64GB at 57.6 tok/s generation in the 8K workload.

  • r/LocalLLaMA

    2026-03-26 · Mac Mini M4 16GB · llama.cpp

    A huge 16GB Mac Mini M4 sweep says Qwen3.5-35B-A3B is worth trying only as a constrained MoE experiment, not as a clean 16GB recommendation.

  • SharpAI HomeSec-Bench

    2026-03-26 · MacBook Pro M5 Pro 64GB · llama.cpp

    On an M5 Pro 64GB MacBook Pro, Qwen3.5-35B-A3B looks like a premium Apple Silicon sweet spot: strong HomeSec-Bench accuracy with local response latency that stays comfortably interactive.

  • r/LocalLLaMA

    2026-03-25 · Mac Mini M4 16GB

    One Apple Silicon builder reports Qwen3.5-35B-A3B powering a real Mac mini M4 16GB agent setup at around 30 tok/s, despite relying on SSD paging.

5 more Apple Silicon field sources tracked in the research queue.

Runtime mentions in the field

llama.cppLM StudioMLXoMLXvllm-mlx

Hardware mentioned in reports

16GB48GB64GB128GBM1 MaxM4MacMac Mini

What would improve confidence

  • Reproduce Field Performance Signal
  • Upgrade To First Party Measurement

Published chip coverage includes M5 Max (48 GB), M3 Ultra (256 GB), M1 Max (64 GB), M5 Max (64 GB), M5 Max (128 GB) plus 4 more chip tiers. Fastest published row is 128.0 tok/s on M5 Max (48 GB) at 4bit. Lowest published RAM requirement is 19.6 GB on M3 Ultra (256 GB). Catalog context window is 8k.

Related Qwen3.5-35B-A3B models with published pages: Qwen3.5-27B · Qwen3.5-9B · Qwen3.5-122B-A10B · Qwen3.5-397B-A17B · Qwen3.5-4B

Standardized eval scorecards for Qwen3.5-35B-A3B

These are fixed-machine model scorecards from a single Apple Silicon setup. They help explain whether a model is merely fast or actually good at tools, coding, reasoning, and general tasks. They do not replace the main Mac ranking above.

Mac Studio M3 Ultra 256GB · Avg 74%

87%Tools
90%Coding
50%Reasoning
70%General

Speed and memory

  • Long decode: 95.2 tok/s
  • Short decode: 31.7 tok/s
  • Cold TTFT: 0.322 s
  • Active RAM: 19.6 GB

Very fast for its size, but reasoning softness is visible in the standardized tasks.

vLLM-MLX SCORECARD.md  ·  discussion · 2026-03-04

Mac Studio M3 Ultra 256GB · Avg 85%

90%Tools
90%Coding
80%Reasoning
80%General

Speed and memory

  • Long decode: 80.0 tok/s
  • Short decode: 32.4 tok/s
  • Cold TTFT: 0.456 s
  • Active RAM: 36.9 GB

The stronger version of the 35B MoE story: fast and much more balanced.

vLLM-MLX SCORECARD.md  ·  discussion · 2026-03-04

Workflow runtime comparisons for Qwen3.5-35B-A3B

These are same-model runtime comparisons on Apple Silicon that capture effective throughput and prefill-heavy behavior. They help explain runtime choice, but they do not replace canonical decode-speed benchmark rows.

MacBook Pro M1 Max 64GB · Effective tok/s · Interactive

Best runtime observed: oMLX (38.0)

Spread to next result: 2.4 tok/s

Runtime results

  • oMLX — 38.0 tok/s · Best reported runtime on this workload.
  • Rapid-MLX — 35.6 tok/s
  • mlx-openai-server — 26.2 tok/s
  • LM Studio (GGUF) — 17.6 tok/s · GGUF reference from the same article.
  • LM Studio (MLX) — 17.0 tok/s · Slowest MLX wrapper in this comparison.

Famstack runtime benchmark writeup · 2026-03-20

These are effective throughput results on an ops-agent workflow. They are best used to compare runtime behavior and caching quality on Apple Silicon, not to replace canonical decode-speed rows.

MacBook Pro M1 Max 64GB · Effective tok/s · 8,000 ctx

Best runtime observed: oMLX (16.4)

Spread to next result: 7.7 tok/s

Runtime results

  • oMLX — 16.4 tok/s · Best reported runtime in this scenario.
  • mlx-openai-server — 8.7 tok/s
  • Rapid-MLX — 8.5 tok/s
  • LM Studio (GGUF) — 7.8 tok/s
  • LM Studio (MLX) — 5.9 tok/s

Famstack runtime benchmark writeup · 2026-03-20

This is an 8K prefill-stress comparison. It is useful for understanding caching and long-context behavior, not for headline decode-speed ranking.

Raw benchmark rows for Qwen3.5-35B-A3B

Rows stay below the ranking because this page is answer-first. Use them to inspect exact chips, quantizations, runtimes, and sources.

ChipQuantRAM req.ContextAvg tok/sPrompt tok/sRuntimeSource
M5 Max (48 GB)4bit8k128.0 tok/s3235.0 tok/sMLXref
M3 Ultra (256 GB)4bit19.6 GB95.0 tok/sMLXref
M5 Max (48 GB)Q4_K - Medium8k89.4 tok/s783.0 tok/sllama.cppref
M3 Ultra (256 GB)8bit36.9 GB80.0 tok/sMLXref
M1 Max (64 GB)4bit8k57.6 tok/s431.0 tok/sMLXref
M1 Max (64 GB)4bit57.0 tok/sLM Studio (MLX)ref
M5 Max (64 GB)Q4_K - Medium52.0 tok/sMLXref
M5 Max (128 GB)Q4_K - Medium48.0 tok/sOllamaref
M5 Pro (64 GB)Q4_K_L27.2 GB41.9 tok/sllama.cppref
M4 Max (48 GB)Q4_K - Medium34.0 tok/sOllamaref
M1 Max (64 GB)Q4_K - Medium29.0 tok/sLM Studio (llama.cpp)ref
M3 Max (96 GB)Q4_K - Medium22.0 tok/sOllamaref
M4 (16 GB)Q4_K - Medium1.3 tok/sllama.cppref

Ordered by fastest published tok/s on the chip family in each Mac. Click through for the full machine page.

benchmarks.json — full dataset  ·  models.json — model summaries  ·  benchmarks.csv — CSV export

See all models →