← All benchmarks

M4 Max (40-core GPU, 64 GB) — LLM Benchmarks

Measured LLM inference benchmarks for M4 Max (40-core GPU, 64 GB). Tokens per second across 6 models and multiple quantizations. Real runs, not estimates.

14Benchmark rows
6Models tested
180.2Fastest avg tok/s (Llama 3.2 1B Instruct)
1Factory-lab verified rows

This chip is part of a family. View all M4 Max (40-core GPU) RAM variants →

Benchmark results for M4 Max (40-core GPU, 64 GB)

Rows sorted by avg tok/s descending. Click source badge to see original measurement page.

ModelQuantRAM req.ContextAvg tok/sPrompt tok/sRuntimeSource
Llama 3.2 1B InstructQ4_K - Medium180.2 tok/s3857.5 tok/sref
Qwen 3 4BQ4_G322.8 GB2k149.1 tok/s2838.4 tok/sMLXref
Qwen 3 4BQ42.5 GB2k148.1 tok/s2976.7 tok/sMLXref
Qwen 3 4BQ53.3 GB2k143.2 tok/s2736.2 tok/sMLXref
Qwen 3 4BQ5_G323.5 GB2k143.0 tok/s2754.5 tok/sMLXref
Qwen 3 4BQ64.0 GB2k136.6 tok/s2735.7 tok/sMLXref
Qwen 3 4BQ85.1 GB2k111.5 tok/s1780.6 tok/sMLXref
Qwen 3 30B A3BQ416.1 GB2k92.1 tok/s822.6 tok/sMLXref
Qwen 3 30B A3BQ518.1 GB2k84.9 tok/s819.8 tok/sMLXref
Qwen 3 30B A3BQ621.9 GB2k76.7 tok/s817.6 tok/sMLXref
Qwen 3 30B A3BQ829.8 GB2k52.6 tok/s772.6 tok/sMLXref
Llama 3.1 8B InstructQ4_K - Medium47.1 tok/s557.1 tok/sref
Qwen 2.5 14B InstructQ4_K - Medium25.9 tok/s286.7 tok/sref
Qwen 3 32BQ4_K_M20.0 GB128k22.0 tok/sfactory harnessfactory lab

benchmarks.json — full dataset  ·  chips.json — chip summaries  ·  benchmarks.csv — CSV export

Data sourced from factory lab measurements and community reference runs. See all chips →