Best local LLMs for MacBook Air M4 32GB 15-inch in 2026

Use this page when your real query is the exact machine, not a generic “best Mac” article. The ranking below is machine-specific, and the supporting links show where the evidence is benchmark-backed, sparse, or still fit-limited.

Current coding-biased answer for MacBook Air M4 32GB 15-inch: Qwen3.6-27B. Use Fit and Bench to verify how much headroom remains once you move past the default answer.

1 benchmark on this exact machine across 1 model. Last benchmark: January 26, 2026. Catalog current through April 22, 2026.

Best local LLMs for this Mac

18 current modelsCatalog current through April 22, 2026Benchmark evidence through April 27, 2026

MacBook Air M4 32GB 15-inch ranked for coding with a most capable bias, using the best available runtime evidence. focused on the current market set.

Use the strongest current runtime evidence for each row.Largest fit: Qwen3.6-35B-A3B at 6bit (3B active / 35B total)Fastest read: Qwen3.5-4B at 148.0 tok/s on MLXRanking evidence: Gemma 4 31B, Qwen3.6-27B, Qwen3.6-35B-A3B +1 are current candidates; sparse rows stay labeled until first-party evidence lands.Next featured Mac: Mac Studio M4 Ultra 256GB planned for June 2026; current default changes after arrival validation and clean first-party evidence.22 historical baseline rows hidden

Current ranking evidence

Fresh releases stay visible, but sparse evidence remains explicit.

Gemma 4 31B

released 2026-04-02 · 5 official specs captured · 4 benchmark rows · 6 Apple Silicon field sources · first-party measurement queued · Mac Studio M4 Ultra 256GB batch planned

Best field report is 28.0 tok/s; keep ranking movement provisional until Bench evidence hardens.

Bench: Mac Studio M4 Ultra 256GB batch planned
Qwen3.6-27B

released 2026-04-22 · 5 official specs captured · 1 benchmark row · 9 Apple Silicon field sources · first-party measurement queued · Mac Studio M4 Ultra 256GB batch planned

Best field report is 85.5 tok/s; keep ranking movement provisional until Bench evidence hardens.

Bench: Mac Studio M4 Ultra 256GB batch planned
Qwen3.6-35B-A3B

released 2026-04-15 · 5 official specs captured · 4 benchmark rows · 15 Apple Silicon field sources · first-party measurement queued · Mac Studio M4 Ultra 256GB batch planned

Best field report is 203.1 tok/s; keep ranking movement provisional until Bench evidence hardens.

Bench: Mac Studio M4 Ultra 256GB batch planned
Gemma 4 26B-A4B

released 2026-04-02 · 6 official specs captured · 4 benchmark rows · 6 Apple Silicon field sources · first-party measurement queued · Mac Studio M4 Ultra 256GB batch planned

Best field report is 75.1 tok/s; keep ranking movement provisional until Bench evidence hardens.

Bench: Mac Studio M4 Ultra 256GB batch planned
RankModelScoreQuantTok/sRuntimeEvidenceHeadroomContextWhy it ranks here
1Gemma 4 31B30.7B parameters2605bit 22.0 tok/s Fastest evidence path: 5bit · 22.0 tok/s · MLX · EstimatedMLXEstimatedFirst-party M5 batch queued6.1 GB10kRecent frontier candidate in the current catalog. 5bit is the highest practical quality here. 22.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 6.1 GB headroom leaves workable context margin.
2Qwen3.6-27B27B parameters259Q6_K 16.6 tok/s Fastest evidence path: Q6_K · 16.6 tok/s · Ollama · EstimatedOllamaEstimatedFirst-party M5 batch queued8.9 GB25kRecent frontier candidate in the current catalog. Q6_K is the highest practical quality here. 16.6 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 8.9 GB headroom leaves workable context margin.
3Devstral Small 2 24B24B parameters2588bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppEstimatedFirst-party M5 batch queued7.9 GB28kRecent frontier candidate in the current catalog. 8bit is the highest practical quality here. 23.4 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 7.9 GB headroom leaves workable context margin.
4Qwen3.5-27B27B parameters256Q6_K 16.1 tok/s Fastest evidence path: Q6_K · 16.1 tok/s · llama.cpp · Estimatedllama.cppEstimatedFirst-party M5 batch queued8.9 GB25kRecent frontier candidate in the current catalog. Q6_K is the highest practical quality here. 16.1 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 8.9 GB headroom leaves workable context margin.
5Qwen3.6-35B-A3B3B active / 35B total2286bit 48.0 tok/s Fastest evidence path: 6bit · 48.0 tok/s · MLX · EstimatedMLXEstimatedFirst-party M5 batch queued6.4 GB30kRecent frontier candidate in the current catalog. 6bit is the highest practical quality here. 48.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 6.4 GB headroom leaves workable context margin.
6Gemma 4 26B-A4B3.8B active / 25.2B total2348bit 40.0 tok/s Fastest evidence path: 8bit · 40.0 tok/s · MLX · EstimatedMLXEstimatedFirst-party M5 batch queued6.2 GB14kRecent model release in the current catalog. 8bit is the highest practical quality here. 40.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 6.2 GB headroom leaves workable context margin.
7Gemma 4 E4B8B parameters2298bit 78.0 tok/s Fastest evidence path: 8bit · 78.0 tok/s · Ollama · EstimatedOllamaEstimatedFirst-party M5 batch queued23.4 GB131kRecent model release in the current catalog. 8bit is the highest practical quality here. 78.0 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 23.4 GB headroom leaves workable context margin.
8Nemotron Cascade 2 30B-A3B3B active / 30B total229Q6_K 28.0 tok/s Fastest evidence path: Q6_K · 28.0 tok/s · Ollama · EstimatedOllamaEstimatedFirst-party M5 batch queued8.2 GB76kRecent model release in the current catalog. Q6_K is the highest practical quality here. 28.0 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 8.2 GB headroom leaves workable context margin.
9Qwen3.5-9B9B parameters2288bit 35.0 tok/s Fastest evidence path: 8bit · 35.0 tok/s · llama.cpp · Estimatedllama.cppEstimatedFirst-party M5 batch queued22.1 GB150kRecent model release in the current catalog. 8bit is the highest practical quality here. 35.0 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 22.1 GB headroom leaves workable context margin.
10Magistral Small24B parameters2268bit Measure it Best availableFit-firstFirst-party M5 batch queued7.9 GB28k8bit is the highest practical quality here. Speed still needs direct speed coverage. 7.9 GB headroom leaves workable context margin.
11Qwen3.5-35B-A3B3B active / 35B total2266bit 52.0 tok/s Fastest evidence path: 6bit · 52.0 tok/s · MLX · EstimatedMLXEstimatedFirst-party M5 batch queued6.4 GB30kRecent model release in the current catalog. 6bit is the highest practical quality here. 52.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 6.4 GB headroom leaves workable context margin.
12Ministral 3 14B14B parameters2258bit 40.0 tok/s Fastest evidence path: 8bit · 40.0 tok/s · Ollama · EstimatedOllamaEstimatedFirst-party M5 batch queued17.2 GB90kRecent model release in the current catalog. 8bit is the highest practical quality here. 40.0 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 17.2 GB headroom leaves workable context margin.
13Nemotron-3-Nano-30B-A3B3.5B active / 30B total224Q6_K 43.7 tok/s Fastest evidence path: Q6_K · 43.7 tok/s · llama.cpp · Estimatedllama.cppEstimatedFirst-party M5 batch queued8.2 GB76kRecent model release in the current catalog. Q6_K is the highest practical quality here. 43.7 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 8.2 GB headroom leaves workable context margin.
14Ministral 3 8B8B parameters2228bit 72.0 tok/s Fastest evidence path: 8bit · 72.0 tok/s · Ollama · EstimatedOllamaEstimatedFirst-party M5 batch queued23.0 GB148kRecent model release in the current catalog. 8bit is the highest practical quality here. 72.0 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 23.0 GB headroom leaves workable context margin.
15GLM-4.7-Flash3B active / 30B total2185bit 58.0 tok/s Fastest evidence path: 5bit · 58.0 tok/s · llama.cpp · Estimatedllama.cppEstimatedFirst-party M5 batch queued6.7 GB10kRecent model release in the current catalog. 5bit is the highest practical quality here. 58.0 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 6.7 GB headroom leaves workable context margin.
16gpt-oss 20B3.6B active / 21B total2048bit Measure it MLXFit-firstFirst-party M5 batch queued11.6 GB131k8bit is the highest practical quality here. Speed still needs direct speed coverage. 11.6 GB headroom leaves workable context margin.
17Gemma 4 E2B5.1B parameters1708bit 95.0 tok/s Fastest evidence path: 8bit · 95.0 tok/s · Ollama · EstimatedOllamaEstimatedFirst-party M5 batch queued26.5 GB131kRecent model release in the current catalog. 8bit is the highest practical quality here. 95.0 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 26.5 GB headroom leaves workable context margin.
18Qwen3.5-4B4B parameters1678bit 148.0 tok/s Fastest evidence path: 8bit · 148.0 tok/s · MLX · EstimatedMLXEstimatedFirst-party M5 batch queued26.8 GB188kRecent model release in the current catalog. 8bit is the highest practical quality here. 148.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 26.8 GB headroom leaves workable context margin.
32GBUnified memory
$1,699MSRP
macbook_airForm factor
M4Chip