M4 Max (128 GB) — benchmark record

M4 Max (128 GB) local LLM benchmarks on Apple Silicon Mac with 19 published rows across 12 models. Peak published speed is 184.4 tok/s on Qwen 3 0.6B. Largest published fit uses 100.0 GB. Published runtimes include llama.cpp, LM Studio, MLX, Ollama.

19Benchmark rows

12Models tested

184.4Fastest avg tok/s (qwen-3-0-6b)

0Lab benchmarks

Quick take

Best published speed here is 184.4 tok/s on Qwen 3 0.6B at Q8_0. Largest published memory footprint is Qwen 3 235B-A22B at 3bit, using 100.0 GB. Longest published context on this page is 131k. This page is evidence, not the full buying answer.

Based on 19 external benchmarks; no lab runs yet.

Published runtimes: llama.cpp, LM Studio, MLX, Ollama.

Need a recommendation? Use Buy Checking a target model? Use Fit Want a setup answer? Use Run Browse Macs by exact hardware Need the full audit trail? Use Bench

Current published coverage

Published model coverage includes Qwen 3 0.6B, Gemma 3 4B, Qwen 3 30B-A3B, Qwen 3 8B plus 8 more published models. Published runtime coverage on this chip includes llama.cpp, LM Studio, MLX, Ollama. Fastest published row is 184.4 tok/s on Qwen 3 0.6B at Q8_0. Largest published footprint on this page is 100.0 GB for Qwen 3 235B-A22B. Longest published context is 131k.

Qwen 3 0.6B Gemma 3 4B Qwen 3 30B-A3B Qwen 3 8B Qwen 2.5 7B Devstral Small 1.1

Other published M4 Max variants: GB · 48 GB · 64 GB

Macs shipping with the M4 Max (128 GB)

MacBook Pro M4 Max 36GB 14-inch MacBook Pro M4 Max 48GB 14-inch MacBook Pro M4 Max 36GB 16-inch MacBook Pro M4 Max 48GB 16-inch MacBook Pro M4 Max 64GB 16-inch MacBook Pro M4 Max 128GB 16-inch Mac Studio M4 Max 36GB Mac Studio M4 Max 48GB Mac Studio M4 Max 64GB Mac Studio M4 Max 128GB

This chip is part of a family. Compare all M4 Max RAM variants →

Raw benchmark rows for M4 Max (128 GB)

Rows sorted by avg tok/s descending. Click source badge to see original measurement page.

Model	Quant	RAM req.	Context	Avg tok/s	Prompt tok/s	Runtime	Source
Qwen 3 0.6B	Q8_0	—	10k	184.4 tok/s	—	LM Studio	ref
Gemma 3 4B	Q4_0	—	4k	100.5 tok/s	—	LM Studio	ref
Qwen 3 30B-A3B	Q4_K - Medium	—	10k	70.2 tok/s	—	LM Studio	ref
Qwen 3 8B	Q4_K - Medium	—	10k	63.1 tok/s	—	LM Studio	ref
Qwen 2.5 7B	Q8_0	—	10k	49.7 tok/s	—	LM Studio	ref
Devstral Small 1.1	4bit	—	—	33.0 tok/s	—	LM Studio	ref
Qwen 3 235B-A22B	3bit	100.0 GB	1k	30.0 tok/s	—	MLX	ref
GLM-4.5-Air	4bit	—	11k	27.8 tok/s	183.3 tok/s	LM Studio	ref
GLM-4.5-Air	4bit	—	21k	18.0 tok/s	120.8 tok/s	LM Studio	ref
Qwen3.6-27B	Q4_K - Medium	29.0 GB	—	16.6 tok/s	114.5 tok/s	Ollama	ref
GLM-4.5-Air	4bit	—	31k	15.1 tok/s	75.5 tok/s	LM Studio	ref
Gemma 3 27B	Q8_0	—	131k	14.5 tok/s	—	LM Studio	ref
Gemma 3 27B	Q8_0	—	10k	13.0 tok/s	—	LM Studio	ref
Llama 3.3 70B	4bit	—	—	11.8 tok/s	—	LM Studio	ref
Qwen 3 32B	Q4_K - Medium	—	10k	11.7 tok/s	—	LM Studio	ref
GLM-4.5-Air	4bit	—	62k	9.6 tok/s	41.5 tok/s	LM Studio	ref
Qwen 3 235B-A22B	Q4_K - Medium	—	10k	8.1 tok/s	—	LM Studio	ref
Llama 3.3 70B	8bit	—	—	6.5 tok/s	—	LM Studio	ref
Llama 3.3 70B	Q8_0	—	36k	4.5 tok/s	58.0 tok/s	llama.cpp	ref

Models tested on this chip

Qwen 3 0.6B Gemma 3 4B Qwen 3 30B-A3B Qwen 3 8B Qwen 2.5 7B Devstral Small 1.1 Qwen 3 235B-A22B GLM-4.5-Air Qwen3.6-27B Gemma 3 27B Llama 3.3 70B Qwen 3 32B

Next step

Use Rankings when you need the best overall answer for a Mac, and keep this page open when you need the evidence behind that recommendation.

If you are comparing local Apple Silicon against rented cloud hardware, use AI Data Center Index for current GPU rental context.

Data

benchmarks.json — full dataset · chips.json — chip summaries · benchmarks.csv — CSV export

Data from in-house lab measurements plus community-published benchmarks. See all chip families →