← All benchmarks

Qwen 3 4B — Apple Silicon Benchmarks

Measured inference speed for Qwen 3 4B across 1 Apple Silicon chip. Tokens per second at multiple quantization levels. Real runs, not estimates.

Quantizations measured: Q4_G32, Q4, Q5, Q5_G32, Q6, Q8

6Benchmark rows
1Chip tiers covered
149.1Fastest avg tok/s (M4 Max (40-core GPU, 64 GB))
2.54 GBMinimum RAM observed

Benchmark results for Qwen 3 4B

Rows sorted by avg tok/s descending. Click source badge to see original measurement page.

ChipQuantRAM req.ContextAvg tok/sPrompt tok/sRuntimeSource
M4 Max (40-core GPU, 64 GB)Q4_G322.8 GB2k149.1 tok/s2838.4 tok/sMLXref
M4 Max (40-core GPU, 64 GB)Q42.5 GB2k148.1 tok/s2976.7 tok/sMLXref
M4 Max (40-core GPU, 64 GB)Q53.3 GB2k143.2 tok/s2736.2 tok/sMLXref
M4 Max (40-core GPU, 64 GB)Q5_G323.5 GB2k143.0 tok/s2754.5 tok/sMLXref
M4 Max (40-core GPU, 64 GB)Q64.0 GB2k136.6 tok/s2735.7 tok/sMLXref
M4 Max (40-core GPU, 64 GB)Q85.1 GB2k111.5 tok/s1780.6 tok/sMLXref

benchmarks.json — full dataset  ·  models.json — model summaries  ·  benchmarks.csv — CSV export

See all models →