← All benchmarks

Research & Reasoning

Research tasks benefit from larger models — better reasoning, more nuanced outputs. This means 32B+ models where quality matters more than raw speed. Here is the hardware data.

30B–70BTypical model size
48–128 GBRecommended RAM
Qwen 3 32B, Qwen 3 30BKey models
10Benchmark rows

Why these models for this use case

Research and reasoning tasks need model quality that only comes at 32B+ parameter counts. Qwen 3 32B at Q4_K_M runs at ~22 tok/s on M4 Max 64 GB — fast enough for interactive research, and the quality gap versus 7B is dramatic. For frontier-scale local inference (Qwen 3 235B A22B at 8 tok/s on M4 Max 128 GB), you need 128 GB+ unified memory. Speed is acceptable for batch tasks even if it feels slow for real-time chat.

Benchmark results — fastest rows first

Filtered to models commonly used for research & reasoning. Sorted by avg tok/s descending.

ChipModelQuantRAM req.Avg tok/sRuntimeSource
M4 Max (40-core GPU, 64 GB)Qwen 3 30B A3BQ416.12 GB92.1 tok/sMLXref
M4 Max (40-core GPU, 64 GB)Qwen 3 30B A3BQ518.09 GB84.9 tok/sMLXref
M4 Max (40-core GPU, 64 GB)Qwen 3 30B A3BQ621.87 GB76.7 tok/sMLXref
M4 Max (128 GB)Qwen 3 30B A3BQ4_K_M70.2 tok/sLM Studioref
M4 Max (40-core GPU, 64 GB)Qwen 3 30B A3BQ829.78 GB52.6 tok/sMLXref
M4 Max (40-core GPU, 64 GB)Qwen 3 32BQ4_K_M20 GB22.0 tok/sfactory harnessfactory lab
M4 Max (128 GB)Gemma 3 27BQ8_014.5 tok/sLM Studioref
M4 Max (32-core GPU)Qwen 3 32BiQ2_K_S11 GB13.2 tok/sref
M4 Max (128 GB)Qwen 3 235B A22BQ4_K_M8.1 tok/sLM Studioref
M4 Max (24-core GPU)Llama 3.3 70BQ5_K_M50 GB7.1 tok/sref

Recommended chips for this use case

Other use cases

benchmarks.json — full dataset  ·  models.json — model summaries  ·  benchmarks.csv

Buying guide: best Mac for local LLMs →