← Canonical rankings
Canonical Rankings

Best Macs for this model

Qwen3.5-35B-A3B ranked across the Mac lineup at the best practical quantization, using the best available runtime evidence.

28 ranked MacsUse the strongest current runtime evidence for each row.Static paths cover only canonical model pages; sort and quantization stay as query state.
RankMacScoreQuantTok/sRuntimeFitsEvidencePriceWhy it ranks here
1Mac Studio M3 Ultra 256GB6168bit80.0 tok/sMLXFitsMeasured$7,4998bit is the current best practical quantization. 80.0 tok/s is directly measured here. 222.3 GB headroom remains at this quantization.
2Mac Pro M2 Ultra 192GB5448bit80.0 tok/sMLXFitsEstimated$6,9998bit is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 158.3 GB headroom remains at this quantization.
3Mac Studio M4 Max 128GB4808bit80.0 tok/sMLXFitsEstimated$4,4998bit is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 94.3 GB headroom remains at this quantization.
4MacBook Pro M4 Max 128GB 16-inch4808bit80.0 tok/sMLXFitsEstimated$5,9998bit is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 94.3 GB headroom remains at this quantization.
5Mac Studio M3 Ultra 96GB4488bit80.0 tok/sMLXFitsEstimated$3,9998bit is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 62.3 GB headroom remains at this quantization.
6Mac Studio M4 Max 64GB4168bit80.0 tok/sMLXFitsEstimated$2,9998bit is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 30.3 GB headroom remains at this quantization.
7MacBook Pro M4 Max 64GB 16-inch4168bit80.0 tok/sMLXFitsEstimated$4,4998bit is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 30.3 GB headroom remains at this quantization.
8Mac Mini M4 Pro 48GB4008bit80.0 tok/sMLXFitsEstimated$1,5998bit is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 14.3 GB headroom remains at this quantization.
9MacBook Pro M4 Pro 48GB 14-inch4008bit80.0 tok/sMLXFitsEstimated$2,4998bit is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 14.3 GB headroom remains at this quantization.
10Mac Studio M4 Max 48GB4008bit80.0 tok/sMLXFitsEstimated$2,4998bit is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 14.3 GB headroom remains at this quantization.
11MacBook Pro M4 Pro 48GB 16-inch4008bit80.0 tok/sMLXFitsEstimated$2,9998bit is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 14.3 GB headroom remains at this quantization.
12MacBook Pro M4 Max 48GB 14-inch4008bit80.0 tok/sMLXFitsEstimated$3,4998bit is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 14.3 GB headroom remains at this quantization.
13MacBook Pro M4 Max 48GB 16-inch4008bit80.0 tok/sMLXFitsEstimated$3,9998bit is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 14.3 GB headroom remains at this quantization.
14Mac Studio M4 Max 36GB388Q6_K80.0 tok/sMLXFitsEstimated$1,999Q6_K is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 8.1 GB headroom remains at this quantization.
15MacBook Pro M4 Max 36GB 14-inch388Q6_K80.0 tok/sMLXFitsEstimated$2,999Q6_K is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 8.1 GB headroom remains at this quantization.
16MacBook Pro M4 Max 36GB 16-inch388Q6_K80.0 tok/sMLXFitsEstimated$3,499Q6_K is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 8.1 GB headroom remains at this quantization.
17Mac Mini M4 32GB3866bit80.0 tok/sMLXFitsEstimated$7996bit is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 6.4 GB headroom remains at this quantization.
18MacBook Air M4 32GB 13-inch3866bit80.0 tok/sMLXFitsEstimated$1,4996bit is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 6.4 GB headroom remains at this quantization.
19MacBook Air M4 32GB 15-inch3866bit80.0 tok/sMLXFitsEstimated$1,6996bit is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 6.4 GB headroom remains at this quantization.
20Mac Mini M4 24GB378Q4_K_M80.0 tok/sMLXFitsEstimated$599Q4_K_M is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 4.2 GB headroom remains at this quantization.
21MacBook Air M4 24GB 13-inch378Q4_K_M80.0 tok/sMLXFitsEstimated$1,299Q4_K_M is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 4.2 GB headroom remains at this quantization.
22Mac Mini M4 Pro 24GB378Q4_K_M80.0 tok/sMLXFitsEstimated$1,399Q4_K_M is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 4.2 GB headroom remains at this quantization.
23MacBook Air M4 24GB 15-inch378Q4_K_M80.0 tok/sMLXFitsEstimated$1,499Q4_K_M is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 4.2 GB headroom remains at this quantization.
24MacBook Pro M4 Pro 24GB 14-inch378Q4_K_M80.0 tok/sMLXFitsEstimated$1,999Q4_K_M is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 4.2 GB headroom remains at this quantization.
25MacBook Pro M4 Pro 24GB 16-inch378Q4_K_M80.0 tok/sMLXFitsEstimated$2,499Q4_K_M is the current best practical quantization. 80.0 tok/s is estimated from nearby benchmark coverage. 4.2 GB headroom remains at this quantization.
26Mac Mini M4 16GB33Q2_K1.3 tok/sllama.cppFitsEstimated$499Q2_K is the current best practical quantization. 1.3 tok/s is estimated from nearby benchmark coverage. 4.2 GB headroom remains at this quantization.
27MacBook Air M4 16GB 13-inch33Q2_K1.3 tok/sllama.cppFitsEstimated$1,099Q2_K is the current best practical quantization. 1.3 tok/s is estimated from nearby benchmark coverage. 4.2 GB headroom remains at this quantization.
28MacBook Air M4 16GB 15-inch33Q2_K1.3 tok/sllama.cppFitsEstimated$1,299Q2_K is the current best practical quantization. 1.3 tok/s is estimated from nearby benchmark coverage. 4.2 GB headroom remains at this quantization.

Qwen3.5-35B-A3B — ranking first, raw rows below

Start with the ranked Mac table above. Use the rest of this page to inspect raw Apple Silicon coverage and model metadata.

Quantizations observed: 4bit, Q4_K - Medium, 8bit, Q4_K_L

7Benchmark rows
5Chip tiers covered
128.0Fastest avg tok/s (M5 Max (48 GB))
19.6 GBMinimum RAM observed

Fastest published result is 128.0 tok/s on M5 Max (48 GB) at 4bit. Smallest published fit is 19.6 GB on M3 Ultra (256 GB). Longest published context on this page is 8k. Published runtimes include llama.cpp, MLX. Start with Rankings for the decision, then use the raw rows below to audit the evidence.

Evidence state: 7 linked reference rows and no Silicon Score Lab rows yet.

Published runtimes here: llama.cpp, MLX.

35BTotal params
3BActive params
262,144Context window
2026-02-24Release date

What this model is, and what Apple Silicon users are actually seeing

Official model cards tell you what the model is for and which software stacks it targets. Field reality below shows how much Apple Silicon evidence we have so far.

Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.

Official source  ·  Raw model card

agentscodingreasoningvisual-understanding

Runtime support mentioned

vLLMSGLangTransformersKTransformers

Official takeaways

  • Type: Causal Language Model with Vision Encoder.
  • Scale: 35B in total and 3B activated.
  • Context: 262,144 natively and extensible up to 1,010,000 tokens..
  • Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding ben…

Official model cards describe intent, capabilities, and supported stacks. They do not prove Apple Silicon speed by themselves.

Qwen3.5-35B-A3B: 11 Apple Silicon field reports; best reported generation ~89.4 tok/s; seen on MacBook Pro M1 MAX 64GB, Mac Mini M4 PRO 64GB, Apple Silicon; via llama.cpp, MLX, Ollama.

7Benchmark rows
11Field reports
8Practitioner signals
Sparse BenchmarksEvidence status

What practitioners keep saying

  • MLX is reported as dramatically improving Apple Silicon usability versus weaker default runtime paths.
  • This model should be treated as a real local contender, not just a spec-sheet curiosity.
  • The reported upgrade is operational, not academic: fewer model swaps and better end-to-end usefulness.

Runtime mentions in the field

Claude CodeContinuellama.cppLM StudioMLXOllama

Hardware mentioned in reports

16GB32GB48GB64GB128GBM1 MaxM1 UltraM4

What would improve confidence

  • Capture Practitioner Runtime Notes
  • Queue Lab Verification If Hardware Available
  • Reproduce Field Performance Signal
  • Upgrade To First Party Measurement

Published chip coverage includes M5 Max (48 GB), M3 Ultra (256 GB), M1 Max (64 GB), M5 Pro (64 GB), M4 (16 GB). Fastest published row is 128.0 tok/s on M5 Max (48 GB) at 4bit. Lowest published RAM requirement is 19.6 GB on M3 Ultra (256 GB). Catalog context window is 8k.

Related Qwen3.5-35B-A3B models with published pages: Qwen3.5-27B · Qwen3.5-9B · Qwen3.5-122B-A10B · Qwen3.5-397B-A17B

Standardized eval scorecards for Qwen3.5-35B-A3B

These are fixed-machine model scorecards from a single Apple Silicon setup. They help explain whether a model is merely fast or actually good at tools, coding, reasoning, and general tasks. They do not replace the main Mac ranking above.

Mac Studio M3 Ultra 256GB · Avg 74%

87%Tools
90%Coding
50%Reasoning
70%General

Speed and memory

  • Long decode: 95.2 tok/s
  • Short decode: 31.7 tok/s
  • Cold TTFT: 0.322 s
  • Active RAM: 19.6 GB

Very fast for its size, but reasoning softness is visible in the standardized tasks.

vLLM-MLX SCORECARD.md  ·  discussion · 2026-03-04

Mac Studio M3 Ultra 256GB · Avg 85%

90%Tools
90%Coding
80%Reasoning
80%General

Speed and memory

  • Long decode: 80.0 tok/s
  • Short decode: 32.4 tok/s
  • Cold TTFT: 0.456 s
  • Active RAM: 36.9 GB

The stronger version of the 35B MoE story: fast and much more balanced.

vLLM-MLX SCORECARD.md  ·  discussion · 2026-03-04

Workflow runtime comparisons for Qwen3.5-35B-A3B

These are same-model runtime comparisons on Apple Silicon that capture effective throughput and prefill-heavy behavior. They help explain runtime choice, but they do not replace canonical decode-speed benchmark rows.

MacBook Pro M1 Max 64GB · Effective tok/s · Interactive

Best runtime observed: oMLX (38.0)

Spread to next result: 2.4 tok/s

Runtime results

  • oMLX — 38.0 tok/s · Best reported runtime on this workload.
  • Rapid-MLX — 35.6 tok/s
  • mlx-openai-server — 26.2 tok/s
  • LM Studio (GGUF) — 17.6 tok/s · GGUF reference from the same article.
  • LM Studio (MLX) — 17.0 tok/s · Slowest MLX wrapper in this comparison.

Famstack runtime benchmark writeup · 2026-03-20

These are effective throughput results on an ops-agent workflow. They are best used to compare runtime behavior and caching quality on Apple Silicon, not to replace canonical decode-speed rows.

MacBook Pro M1 Max 64GB · Effective tok/s · 8,000 ctx

Best runtime observed: oMLX (16.4)

Spread to next result: 7.7 tok/s

Runtime results

  • oMLX — 16.4 tok/s · Best reported runtime in this scenario.
  • mlx-openai-server — 8.7 tok/s
  • Rapid-MLX — 8.5 tok/s
  • LM Studio (GGUF) — 7.8 tok/s
  • LM Studio (MLX) — 5.9 tok/s

Famstack runtime benchmark writeup · 2026-03-20

This is an 8K prefill-stress comparison. It is useful for understanding caching and long-context behavior, not for headline decode-speed ranking.

Raw benchmark rows for Qwen3.5-35B-A3B

Rows stay below the ranking because this page is answer-first. Use them to inspect exact chips, quantizations, runtimes, and sources.

ChipQuantRAM req.ContextAvg tok/sPrompt tok/sRuntimeSource
M5 Max (48 GB)4bit8k128.0 tok/s3235.0 tok/sMLXref
M3 Ultra (256 GB)4bit19.6 GB95.0 tok/sMLXref
M5 Max (48 GB)Q4_K - Medium8k89.4 tok/s783.0 tok/sllama.cppref
M3 Ultra (256 GB)8bit36.9 GB80.0 tok/sMLXref
M1 Max (64 GB)4bit8k57.6 tok/s431.0 tok/sMLXref
M5 Pro (64 GB)Q4_K_L27.2 GB41.9 tok/sllama.cppref
M4 (16 GB)Q4_K - Medium1.3 tok/sllama.cppref

benchmarks.json — full dataset  ·  models.json — model summaries  ·  benchmarks.csv — CSV export

See all models →