← Canonical rankings
Canonical Rankings

Best Macs for this model

Qwen3.5-27B ranked across the Mac lineup at the best practical quantization, using the best available runtime evidence. Model picker is focused on current-market choices.

29 ranked MacsUse the strongest current runtime evidence for each row.28 historical models hiddenStatic paths cover only canonical model pages; sort and quantization stay as query state.
RankMacScoreQuantTok/sRuntimeFitsHeadroomContextEvidencePriceWhy it ranks here
1Mac Studio M3 Ultra 256GB4468bit 38.0 tok/s Fastest evidence path: 8bit · 38.0 tok/s · MLX · EstimatedMLXFits228.4 GB262kEstimated$7,4998bit is the current best practical quantization. 38.0 tok/s is estimated from nearby benchmark coverage. 228.4 GB headroom remains at this quantization.
2Mac Pro M2 Ultra 192GB2958bit 16.1 tok/s Fastest evidence path: 8bit · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits164.4 GB262kEstimated$6,9998bit is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 164.4 GB headroom remains at this quantization.
3MacBook Pro M5 Max 128GB 16-inch2938bit 31.6 tok/s Fastest evidence path: 8bit · 31.6 tok/s · MLX · EstimatedMLXFits100.4 GB262kEstimated$5,3998bit is the current best practical quantization. 31.6 tok/s is estimated from nearby benchmark coverage. 100.4 GB headroom remains at this quantization.
4Mac Studio M4 Max 128GB2318bit 16.1 tok/s Fastest evidence path: 8bit · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits100.4 GB262kEstimated$4,4998bit is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 100.4 GB headroom remains at this quantization.
5MacBook Pro M4 Max 128GB 16-inch2318bit 16.1 tok/s Fastest evidence path: 8bit · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits100.4 GB262kEstimated$5,9998bit is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 100.4 GB headroom remains at this quantization.
6Mac Studio M3 Ultra 96GB1998bit 16.1 tok/s Fastest evidence path: 8bit · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits68.4 GB229kEstimated$3,9998bit is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 68.4 GB headroom remains at this quantization.
7Mac Studio M4 Max 64GB1678bit 16.1 tok/s Fastest evidence path: 8bit · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits36.4 GB118kEstimated$2,9998bit is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 36.4 GB headroom remains at this quantization.
8MacBook Pro M4 Max 64GB 16-inch1678bit 16.1 tok/s Fastest evidence path: 8bit · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits36.4 GB118kEstimated$4,4998bit is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 36.4 GB headroom remains at this quantization.
9Mac Studio M4 Max 48GB1518bit 16.1 tok/s Fastest evidence path: 8bit · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits20.4 GB62kEstimated$2,4998bit is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 20.4 GB headroom remains at this quantization.
10MacBook Pro M4 Max 48GB 14-inch1518bit 16.1 tok/s Fastest evidence path: 8bit · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits20.4 GB62kEstimated$3,4998bit is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 20.4 GB headroom remains at this quantization.
11MacBook Pro M4 Max 48GB 16-inch1518bit 16.1 tok/s Fastest evidence path: 8bit · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits20.4 GB62kEstimated$3,9998bit is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 20.4 GB headroom remains at this quantization.
12Mac Studio M4 Max 36GB1398bit 16.1 tok/s Fastest evidence path: 8bit · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits8.4 GB20kEstimated$1,9998bit is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 8.4 GB headroom remains at this quantization.
13MacBook Pro M4 Max 36GB 14-inch1398bit 16.1 tok/s Fastest evidence path: 8bit · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits8.4 GB20kEstimated$2,9998bit is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 8.4 GB headroom remains at this quantization.
14MacBook Pro M4 Max 36GB 16-inch1398bit 16.1 tok/s Fastest evidence path: 8bit · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits8.4 GB20kEstimated$3,4998bit is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 8.4 GB headroom remains at this quantization.
15Mac Mini M4 32GB133Q6_K 16.1 tok/s Fastest evidence path: Q6_K · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits8.9 GB25kEstimated$799Q6_K is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 8.9 GB headroom remains at this quantization.
16MacBook Air M4 32GB 13-inch133Q6_K 16.1 tok/s Fastest evidence path: Q6_K · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits8.9 GB25kEstimated$1,499Q6_K is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 8.9 GB headroom remains at this quantization.
17MacBook Air M4 32GB 15-inch133Q6_K 16.1 tok/s Fastest evidence path: Q6_K · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits8.9 GB25kEstimated$1,699Q6_K is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 8.9 GB headroom remains at this quantization.
18Mac Mini M4 24GB128Q5_K_M 16.1 tok/s Fastest evidence path: Q5_K_M · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits3.6 GB8kEstimated$599Q5_K_M is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 3.6 GB headroom remains at this quantization.
19MacBook Air M4 24GB 13-inch128Q5_K_M 16.1 tok/s Fastest evidence path: Q5_K_M · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits3.6 GB8kEstimated$1,299Q5_K_M is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 3.6 GB headroom remains at this quantization.
20Mac Mini M4 Pro 24GB128Q5_K_M 16.1 tok/s Fastest evidence path: Q5_K_M · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits3.6 GB8kEstimated$1,399Q5_K_M is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 3.6 GB headroom remains at this quantization.
21MacBook Air M4 24GB 15-inch128Q5_K_M 16.1 tok/s Fastest evidence path: Q5_K_M · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits3.6 GB8kEstimated$1,499Q5_K_M is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 3.6 GB headroom remains at this quantization.
22MacBook Pro M4 Pro 24GB 14-inch128Q5_K_M 16.1 tok/s Fastest evidence path: Q5_K_M · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits3.6 GB8kEstimated$1,999Q5_K_M is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 3.6 GB headroom remains at this quantization.
23MacBook Pro M4 Pro 24GB 16-inch128Q5_K_M 16.1 tok/s Fastest evidence path: Q5_K_M · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits3.6 GB8kEstimated$2,499Q5_K_M is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 3.6 GB headroom remains at this quantization.
24Mac Mini M4 Pro 48GB1288bit 8.5 tok/s Fastest evidence path: 8bit · 8.5 tok/s · MLX · Community rowMLXFits20.4 GB62kCommunity row$1,5998bit is the current best practical quantization. 8.5 tok/s is backed by direct benchmark coverage. 20.4 GB headroom remains at this quantization.
25MacBook Pro M4 Pro 48GB 14-inch1288bit 8.5 tok/s Fastest evidence path: 8bit · 8.5 tok/s · MLX · Community rowMLXFits20.4 GB62kCommunity row$2,4998bit is the current best practical quantization. 8.5 tok/s is backed by direct benchmark coverage. 20.4 GB headroom remains at this quantization.
26MacBook Pro M4 Pro 48GB 16-inch1288bit 8.5 tok/s Fastest evidence path: 8bit · 8.5 tok/s · MLX · Community rowMLXFits20.4 GB62kCommunity row$2,9998bit is the current best practical quantization. 8.5 tok/s is backed by direct benchmark coverage. 20.4 GB headroom remains at this quantization.
27Mac Mini M4 16GB933bit 16.1 tok/s Fastest evidence path: 3bit · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits4.1 GB15kEstimated$4993bit is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 4.1 GB headroom remains at this quantization.
28MacBook Air M4 16GB 13-inch933bit 16.1 tok/s Fastest evidence path: 3bit · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits4.1 GB15kEstimated$1,0993bit is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 4.1 GB headroom remains at this quantization.
29MacBook Air M4 16GB 15-inch933bit 16.1 tok/s Fastest evidence path: 3bit · 16.1 tok/s · llama.cpp · Estimatedllama.cppFits4.1 GB15kEstimated$1,2993bit is the current best practical quantization. 16.1 tok/s is estimated from nearby benchmark coverage. 4.1 GB headroom remains at this quantization.

Qwen3.5-27B — ranking first, raw rows below

Start with the ranked Mac table above. Use the rest of this page to inspect raw Apple Silicon coverage and model metadata.

Quantizations observed: 4bit, 8bit, Q4_K - Medium, Q4_K, Q6_K, Q8_0

18Benchmark rows
10Chip tiers covered
38.0Fastest avg tok/s (M3 Ultra (256 GB))
15.3 GBMinimum RAM observed

Fastest published result is 38.0 tok/s on M3 Ultra (256 GB) at 4bit. Smallest published fit is 15.3 GB on M3 Ultra (256 GB). Longest published context on this page is 16k. Published runtimes include llama.cpp, MLX. Start with Rankings for the decision, then use the raw rows below to audit the evidence.

Based on 18 external benchmarks; no lab runs yet.

Published runtimes: llama.cpp, MLX.

27BTotal params
DenseActive params
262,144Context window
2026-02-24Release date

What this model is, and what Apple Silicon users are actually seeing

Official model cards tell you what the model is for and which software stacks it targets. Field reality below shows how much Apple Silicon evidence we have so far.

Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.

Official source  ·  Raw model card

agentscodingreasoningvisual-understanding

Runtime support mentioned

vLLMSGLangTransformersKTransformers

Official specs

  • Type: Causal Language Model with Vision Encoder.
  • Scale: 27B.
  • Context: 262,144 natively and extensible up to 1,010,000 tokens.
  • Total parameters: 27B.
  • Max input: 262,144 natively and extensible up to 1,010,000 tokens.

Official takeaways

  • Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding ben…
  • Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead.
  • Scalable RL Generalization: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability.
  • Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.

Official model cards describe intent, capabilities, and supported stacks. They do not prove Apple Silicon speed by themselves.

Qwen3.5-27B: 20 Apple Silicon field reports; best reported generation ~32.8 tok/s; best reported prompt processing ~222.23 tok/s; seen on MacBook Pro M5 MAX 128GB, MacBook Pro M5 MAX 48GB, M2 ULTRA 128GB; via oMLX, MLX, llama.cpp.

18Benchmark rows
20Field reports
15Practitioner signals
Sparse BenchmarksEvidence status

What practitioners keep saying

  • The post reports Qwen3.5-27B dense on a MacBook Pro M3 Max 128GB at pp1024/tg128 with 23.0 tok/s generation under oMLX v0.2.23.
  • The post reports the same M3 Max 128GB Qwen3.5-27B dense run at 65000 tokens context with 6.8 tok/s generation.
  • The post reports Qwen3.5-27B dense on a MacBook Pro M5 Max 128GB at pp1024/tg128 with 32.8 tok/s generation under oMLX v0.2.23.

Apple Silicon field sources

  • r/LocalLLaMA

    2026-03-28 · MacBook Pro M3 Max 128GB, MacBook Pro M5 Max 128GB · oMLX

    The same oMLX comparison reports Qwen3.5-27B dense at 23.0 tg tok/s on a MacBook Pro M3 Max 128GB at pp1024/tg128.

  • r/LocalLLaMA

    2026-03-26 · Mac Studio M1 Max 64GB, MacBook Pro M5 Max 48GB · llama.cpp, MLX

    The benchmark bundle reports Qwen3.5-27B 4-bit MLX on a Mac Studio M1 Max 64GB at 15.0 tok/s generation in the 8K workload.

  • r/LocalLLaMA

    2026-03-26 · Mac Mini M4 16GB · llama.cpp

    A large 16GB M4 sweep says Qwen3.5-27B is a trap tier on constrained Apple Silicon: it may fit on paper, but it often collapses into memory-thrashing behavior.

  • SharpAI HomeSec-Bench

    2026-03-26 · MacBook Pro M5 Pro 64GB · llama.cpp

    The same M5 Pro 64GB Apple Silicon benchmark puts Qwen3.5-27B in the usable local-agent tier rather than the barely-fits category.

  • r/LocalLLaMA

    2026-03-22 · MacBook Pro M1 Max 64GB · llama.cpp

    Qwen3.5-27B remains usable on a 64GB M1 Max laptop, but the thread reinforces that older Apple Silicon tops out around the low-teens tok/s range in llama.cpp rather than feeling like a fast dense-model workstation.

6 more Apple Silicon field sources tracked in the research queue.

Runtime mentions in the field

llama.cppLM StudioMLXoMLX

Hardware mentioned in reports

16GB32GB48GB64GB96GB128GBM1 MaxM4

What would improve confidence

  • Reproduce Field Performance Signal
  • Upgrade To First Party Measurement

Published chip coverage includes M3 Ultra (256 GB), M5 Max (128 GB), M5 Max (48 GB), M2 Ultra (GPU count not published, 128 GB), M4 Max plus 5 more chip tiers. Fastest published row is 38.0 tok/s on M3 Ultra (256 GB) at 4bit. Lowest published RAM requirement is 15.3 GB on M3 Ultra (256 GB). Catalog context window is 16k.

Related Qwen3.5-27B models with published pages: Qwen3.5-35B-A3B · Qwen3.5-9B · Qwen3.5-122B-A10B · Qwen3.5-397B-A17B · Qwen3.5-4B

Standardized eval scorecards for Qwen3.5-27B

These are fixed-machine model scorecards from a single Apple Silicon setup. They help explain whether a model is merely fast or actually good at tools, coding, reasoning, and general tasks. They do not replace the main Mac ranking above.

Mac Studio M3 Ultra 256GB · Avg 76%

83%Tools
90%Coding
50%Reasoning
80%General

Speed and memory

  • Long decode: 37.7 tok/s
  • Short decode: 17.7 tok/s
  • Cold TTFT: 0.453 s
  • Active RAM: 15.3 GB

A strong fits-anywhere coding and tool-use compromise.

vLLM-MLX SCORECARD.md  ·  discussion · 2026-03-04

Raw benchmark rows for Qwen3.5-27B

Rows stay below the ranking because this page is answer-first. Use them to inspect exact chips, quantizations, runtimes, and sources.

ChipQuantRAM req.ContextAvg tok/sPrompt tok/sRuntimeSource
M3 Ultra (256 GB)4bit15.3 GB38.0 tok/sMLXref
M5 Max (128 GB)4bit31.6 tok/sMLXref
M5 Max (48 GB)4bit8k31.3 tok/s779.0 tok/sMLXref
M2 Ultra (GPU count not published, 128 GB)8bit27.1 tok/sMLXref
M5 Max (48 GB)Q4_K - Medium8k23.7 tok/s171.0 tok/sllama.cppref
M2 Ultra (GPU count not published, 128 GB)8bit20.6 tok/sMLXref
M4 MaxQ4_K16.4 GB2k16.7 tok/s222.2 tok/sllama.cppref
M5 Max (128 GB)Q6_K8k16.5 tok/sllama.cppref
M4 MaxQ4_K16.4 GB8k16.1 tok/s209.3 tok/sllama.cppref
M4 MaxQ4_K16.4 GB16k15.8 tok/s195.4 tok/sllama.cppref
M1 Max (64 GB)4bit8k15.0 tok/s67.0 tok/sMLXref
M1 Max (64 GB)Q6_K12.0 tok/sllama.cppref
M2 Max (38-core GPU, 96 GB)8bit30.0 GB12.0 tok/sMLXref
M1 Max (64 GB)Q4_K - Medium11.5 tok/sllama.cppref
M1 Max (64 GB)Q8_010.5 tok/sllama.cppref
M5 Pro (64 GB)Q4_K - Medium24.9 GB10.0 tok/sllama.cppref
M4 Pro (48 GB)8bit8.5 tok/sMLXref
M4 (16 GB)Q4_K - Medium0.0 tok/sllama.cppref

Ordered by fastest published tok/s on the chip family in each Mac. Click through for the full machine page.

benchmarks.json — full dataset  ·  models.json — model summaries  ·  benchmarks.csv — CSV export

See all models →