Qwen 3 30B A3B — Apple Silicon Benchmarks

Measured inference speed for Qwen 3 30B A3B across 2 Apple Silicon chips. Tokens per second at multiple quantization levels. Real runs, not estimates.

Quantizations measured: Q4, Q5, Q6, Q4_K_M, Q8

5Benchmark rows

2Chip tiers covered

92.1Fastest avg tok/s (M4 Max (40-core GPU, 64 GB))

16.12 GBMinimum RAM observed

Benchmark results for Qwen 3 30B A3B

Rows sorted by avg tok/s descending. Click source badge to see original measurement page.

Chip	Quant	RAM req.	Context	Avg tok/s	Prompt tok/s	Runtime	Source
M4 Max (40-core GPU, 64 GB)	Q4	16.1 GB	2k	92.1 tok/s	822.6 tok/s	MLX	ref
M4 Max (40-core GPU, 64 GB)	Q5	18.1 GB	2k	84.9 tok/s	819.8 tok/s	MLX	ref
M4 Max (40-core GPU, 64 GB)	Q6	21.9 GB	2k	76.7 tok/s	817.6 tok/s	MLX	ref
M4 Max (128 GB)	Q4_K_M	—	10k	70.2 tok/s	—	LM Studio	ref
M4 Max (40-core GPU, 64 GB)	Q8	29.8 GB	2k	52.6 tok/s	772.6 tok/s	MLX	ref

Chips with published results for Qwen 3 30B A3B

M4 Max (40-core GPU, 64 GB)M4 Max (128 GB)

Data

benchmarks.json — full dataset · models.json — model summaries · benchmarks.csv — CSV export

See all models →