Qwen 3 4B — Apple Silicon Benchmarks

Measured inference speed for Qwen 3 4B across 1 Apple Silicon chip. Tokens per second at multiple quantization levels. Real runs, not estimates.

Quantizations measured: Q4_G32, Q4, Q5, Q5_G32, Q6, Q8

6Benchmark rows

1Chip tiers covered

149.1Fastest avg tok/s (M4 Max (40-core GPU, 64 GB))

2.54 GBMinimum RAM observed

Benchmark results for Qwen 3 4B

Rows sorted by avg tok/s descending. Click source badge to see original measurement page.

Chip	Quant	RAM req.	Context	Avg tok/s	Prompt tok/s	Runtime	Source
M4 Max (40-core GPU, 64 GB)	Q4_G32	2.8 GB	2k	149.1 tok/s	2838.4 tok/s	MLX	ref
M4 Max (40-core GPU, 64 GB)	Q4	2.5 GB	2k	148.1 tok/s	2976.7 tok/s	MLX	ref
M4 Max (40-core GPU, 64 GB)	Q5	3.3 GB	2k	143.2 tok/s	2736.2 tok/s	MLX	ref
M4 Max (40-core GPU, 64 GB)	Q5_G32	3.5 GB	2k	143.0 tok/s	2754.5 tok/s	MLX	ref
M4 Max (40-core GPU, 64 GB)	Q6	4.0 GB	2k	136.6 tok/s	2735.7 tok/s	MLX	ref
M4 Max (40-core GPU, 64 GB)	Q8	5.1 GB	2k	111.5 tok/s	1780.6 tok/s	MLX	ref

Chips with published results for Qwen 3 4B

Data

benchmarks.json — full dataset · models.json — model summaries · benchmarks.csv — CSV export