Compatibility route: RunCompatibility route

Which models are practical on MacBook Pro M5 Max 128GB 16-inch?

Current lens: Coding. Rank models by most capable, while keeping throughput, fit, local cost, and evidence on the same sheet.

This route now presets the rankings instead of acting like a separate product.

Snapshot

Top model: MiniMax M2.7

50 viable rows · 21 benchmark-backed · — at 3bit.

Rows
432
Models
55
Macs
29
Benchmark-backed
21

Catalog current through April 22, 2026. Benchmark evidence through April 27, 2026.

Query setup

Audit evidence

Choose a Mac first, then narrow by lens, quant target, runtime, and ranking preference.

Results

Model matches for MacBook Pro M5 Max 128GB 16-inch

Sorted by most capable · 50 viable rows · 21 benchmark-backed

Viable rows
50
Benchmark-backed rows
21
#1Fit-firstInsufficient data

MiniMax M2.7

MacBook Pro M5 Max 128GB 16-inch · 128GB · $5,399 · Portable

Output
Prompt
Quant3bitSource-backed MLX MiniMax-M2.7-3bit - 112 GB min
RuntimeBest available
Headroom16.0 GB
Context116k
Local costHold for speed
Detail Open

Capability

228.704B dense · tuned for coding

Evidence

Fit uses a source-backed memory profile; direct speed coverage is still missing MiniMax-M2.7-3bit source profile lists 112 GB minimum memory on MLX; throughput still needs direct benchmark coverage.

Coverage

No direct benchmark rows yet

No speed row yet, so this cost read is held.

Quant ladder and fit detail
QuantQualityHeadroomContextSpeedRuntimeEvidence
3bit Source-backed MLX MiniMax-M2.7-3bit - 112 GB minCompact quality16.0 GB116kBest availableFit-first
#2Trusted referenceLow confidence

Qwen 2.5 72B

MacBook Pro M5 Max 128GB 16-inch · 128GB · $5,399 · Portable

Output10.0 tok/s
Prompt
QuantQ8_0
RuntimeOllama
Headroom53.3 GB
Context131k
Local cost$57.3
Detail Open

Capability

72.7B dense · tuned for coding

Evidence

Direct trusted-reference benchmark coverage on this hardware class Speed is backed by trusted-reference benchmark coverage. Most common runtime in the evidence is Ollama.

Coverage

1 direct benchmark row

Default 10% utilization napkin math.

Quant ladder and fit detail
QuantQualityHeadroomContextSpeedRuntimeEvidence
Q8_0 Reference quality53.3 GB131k10.0 tok/sOllamaTrusted reference
Q6_K High quality66.4 GB131k10.0 tok/sOllamaTrusted reference
Q5_K_M High quality74.3 GB131k10.0 tok/sOllamaTrusted reference
Q4_K_M Balanced quality84.4 GB131k10.0 tok/sOllamaTrusted reference
#3Trusted referenceLow confidence

DeepSeek R1 Distill Llama 70B

MacBook Pro M5 Max 128GB 16-inch · 128GB · $5,399 · Portable

Output11.0 tok/s
Prompt
QuantQ8_0
RuntimeOllama
Headroom55.4 GB
Context131k
Local cost$52.1
Detail Open

Capability

70.6B dense · tuned for coding

Evidence

Direct trusted-reference benchmark coverage on this hardware class Speed is backed by trusted-reference benchmark coverage. Most common runtime in the evidence is Ollama.

Coverage

1 direct benchmark row

Default 10% utilization napkin math.

Quant ladder and fit detail
QuantQualityHeadroomContextSpeedRuntimeEvidence
Q8_0 Reference quality55.4 GB131k11.0 tok/sOllamaTrusted reference
Q6_K High quality68.1 GB131k11.0 tok/sOllamaTrusted reference
Q5_K_M High quality75.8 GB131k11.0 tok/sOllamaTrusted reference
Q4_K_M Balanced quality85.6 GB131k11.0 tok/sOllamaTrusted reference
#4Trusted referenceMedium confidence

Llama 3.3 70B

MacBook Pro M5 Max 128GB 16-inch · 128GB · $5,399 · Portable

Output15.0 tok/s
Prompt
QuantQ8_0
RuntimeOllama
Headroom55.4 GB
Context131k
Local cost$38.2
Detail Open

Capability

70.6B dense · tuned for coding

Evidence

Direct trusted-reference benchmark coverage on this hardware class Speed is backed by trusted-reference benchmark coverage. Most common runtime in the evidence is Ollama.

Coverage

2 direct benchmark rows

Default 10% utilization napkin math.

Quant ladder and fit detail
QuantQualityHeadroomContextSpeedRuntimeEvidence
Q8_0 Reference quality55.4 GB131k15.0 tok/sOllamaTrusted reference
Q6_K High quality68.1 GB131k15.0 tok/sOllamaTrusted reference
Q5_K_M High quality75.8 GB131k15.0 tok/sOllamaTrusted reference
Q4_K_M Balanced quality85.6 GB131k15.0 tok/sOllamaTrusted reference
#5Trusted referenceLow confidence

Qwen 3 32B

MacBook Pro M5 Max 128GB 16-inch · 128GB · $5,399 · Portable

Output28.0 tok/s
Prompt
QuantQ8_0
RuntimeOllama
Headroom93.2 GB
Context131k
Local cost$20.5
Detail Open

Capability

32.76B dense · tuned for coding

Evidence

Direct trusted-reference benchmark coverage on this hardware class Speed is backed by trusted-reference benchmark coverage. Most common runtime in the evidence is Ollama.

Coverage

1 direct benchmark row

Default 10% utilization napkin math.

Quant ladder and fit detail
QuantQualityHeadroomContextSpeedRuntimeEvidence
Q8_0 Reference quality93.2 GB131k28.0 tok/sOllamaTrusted reference
Q6_K High quality99.1 GB131k28.0 tok/sOllamaTrusted reference
Q5_K_M High quality102.7 GB131k28.0 tok/sOllamaTrusted reference
Q4_K_M Balanced quality107.2 GB131k28.0 tok/sOllamaTrusted reference
#6Trusted referenceLow confidence

Gemma 4 31B

MacBook Pro M5 Max 128GB 16-inch · 128GB · $5,399 · Portable

Output26.0 tok/s
Prompt
QuantQ8_0
RuntimeMLX
Headroom95.3 GB
Context104k
Local cost$22.0
Detail Open

Capability

30.7B dense · tuned for coding

Evidence

Direct trusted-reference benchmark coverage on this hardware class Speed is backed by trusted-reference benchmark coverage. Most common runtime in the evidence is MLX.

Coverage

1 direct benchmark row

Default 10% utilization napkin math.

Quant ladder and fit detail
QuantQualityHeadroomContextSpeedRuntimeEvidence
Q8_0 Reference quality95.3 GB104k26.0 tok/sMLXTrusted reference
Q6_K High quality100.8 GB110k26.0 tok/sMLXTrusted reference
Q5_K_M High quality104.2 GB114k26.0 tok/sMLXTrusted reference
Q4_K_M Balanced quality108.4 GB118k26.0 tok/sMLXTrusted reference
#7EstimatedLow confidence

DeepSeek R1 Distill Qwen 32B

MacBook Pro M5 Max 128GB 16-inch · 128GB · $5,399 · Portable

Output27.0 tok/s
Prompt
QuantQ8_0
RuntimeOllama
Headroom93.2 GB
Context131k
Local cost$21.2
Detail Open

Capability

32.76B dense · tuned for coding

Evidence

Estimated from nearby benchmark coverage, not a direct match Speed is estimated from nearby benchmark coverage rather than this exact machine-and-quant match. Best runtime hint: Ollama.

Coverage

No direct benchmark rows yet

Speed is estimated, so this cost read is provisional.

Quant ladder and fit detail
QuantQualityHeadroomContextSpeedRuntimeEvidence
Q8_0 Reference quality93.2 GB131k27.0 tok/sOllamaEstimated
Q6_K High quality99.1 GB131k27.0 tok/sOllamaEstimated
Q5_K_M High quality102.7 GB131k27.0 tok/sOllamaEstimated
Q4_K_M Balanced quality107.2 GB131k27.0 tok/sOllamaEstimated
#8Community rowInsufficient data

Qwen3.5-27B

MacBook Pro M5 Max 128GB 16-inch · 128GB · $5,399 · Portable

Output31.6 tok/s
Prompt
QuantQ8_0
RuntimeMLX
Headroom99.0 GB
Context8k
Local cost$18.1
Detail Open

Capability

27B dense · tuned for coding

Evidence

Direct community benchmark coverage on this hardware class Speed is backed by community benchmark coverage. Most common runtime in the evidence is MLX.

Coverage

2 direct benchmark rows

Default 10% utilization napkin math.

Quant ladder and fit detail
QuantQualityHeadroomContextSpeedRuntimeEvidence
Q8_0 Reference quality99.0 GB262k31.6 tok/sMLXCommunity row
Q6_K High quality103.9 GB262k16.5 tok/sllama.cppCommunity row
Q5_K_M High quality106.8 GB262k31.6 tok/sMLXCommunity row
Q4_K_M Balanced quality110.5 GB262k31.6 tok/sMLXCommunity row

FAQ

Frequently asked questions about Run

These answers stay tied to the live workspace defaults for this compatibility route, so the copy explains the same sort order and query framing the table is using.

What does the Run route optimize for?
Run starts with a Mac and sorts toward the most capable models that remain practical on that machine. It is the fastest way to answer what your current Mac can run locally without guessing from raw memory numbers alone.
Why does Run show models instead of Macs?
Run flips the workspace into Mac-to-models mode. Instead of asking which Mac to buy for a model, it asks which models remain usable on the Mac you already own or plan to deploy.
How should I interpret the evidence labels on Run?
Benchmark-backed rows have direct speed evidence, with Lab, trusted-reference, and community labels showing provenance. Estimated rows are derived from adjacent evidence, and fit-first rows are memory-feasibility reads.