Best local LLMs for MacBook Air M4 24GB 15-inch in 2026

Use this page when your real query is the exact machine, not a generic “best Mac” article. The ranking below is machine-specific, and the supporting links show where the evidence is benchmark-backed, sparse, or still fit-limited.

Current coding-biased answer for MacBook Air M4 24GB 15-inch: Qwen3.6-27B. Use Fit and Bench to verify how much headroom remains once you move past the default answer.

0 benchmarks on this exact machine across 0 models. Catalog current through April 22, 2026.

Best local LLMs for this Mac

16 current modelsCatalog current through April 22, 2026Benchmark evidence through April 27, 2026

MacBook Air M4 24GB 15-inch ranked for coding with a most capable bias, using the best available runtime evidence. focused on the current market set.

Use the strongest current runtime evidence for each row.Largest fit: Qwen3.6-35B-A3B at Q4_K_M (3B active / 35B total)Fastest read: Qwen3.5-4B at 148.0 tok/s on MLXRanking evidence: Qwen3.6-27B, Qwen3.6-35B-A3B, Gemma 4 26B-A4B +1 are current candidates; sparse rows stay labeled until first-party evidence lands.Next featured Mac: Mac Studio M4 Ultra 256GB planned for June 2026; current default changes after arrival validation and clean first-party evidence.22 historical baseline rows hidden

Current ranking evidence

Fresh releases stay visible, but sparse evidence remains explicit.

Qwen3.6-27B

released 2026-04-22 · 5 official specs captured · 1 benchmark row · 9 Apple Silicon field sources · first-party measurement queued · Mac Studio M4 Ultra 256GB batch planned

Best field report is 85.5 tok/s; keep ranking movement provisional until Bench evidence hardens.

Bench: Mac Studio M4 Ultra 256GB batch planned
Qwen3.6-35B-A3B

released 2026-04-15 · 5 official specs captured · 4 benchmark rows · 15 Apple Silicon field sources · first-party measurement queued · Mac Studio M4 Ultra 256GB batch planned

Best field report is 203.1 tok/s; keep ranking movement provisional until Bench evidence hardens.

Bench: Mac Studio M4 Ultra 256GB batch planned
Gemma 4 26B-A4B

released 2026-04-02 · 6 official specs captured · 4 benchmark rows · 6 Apple Silicon field sources · first-party measurement queued · Mac Studio M4 Ultra 256GB batch planned

Best field report is 75.1 tok/s; keep ranking movement provisional until Bench evidence hardens.

Bench: Mac Studio M4 Ultra 256GB batch planned
Gemma 4 E4B

released 2026-04-02 · 5 official specs captured · 5 benchmark rows · 5 Apple Silicon field sources · first-party measurement queued · Mac Studio M4 Ultra 256GB batch planned

Best field report is 76.8 tok/s; keep ranking movement provisional until Bench evidence hardens.

Bench: Mac Studio M4 Ultra 256GB batch planned
RankModelScoreQuantTok/sRuntimeEvidenceHeadroomContextWhy it ranks here
1Qwen3.6-27B27B parameters254Q5_K_M 16.6 tok/s Fastest evidence path: Q5_K_M · 16.6 tok/s · Ollama · EstimatedOllamaEstimatedFirst-party M5 batch queued3.6 GB8kRecent frontier candidate in the current catalog. Q5_K_M is the highest practical quality here. 16.6 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 3.6 GB headroom is tight.
2Qwen3.5-27B27B parameters250Q5_K_M 16.1 tok/s Fastest evidence path: Q5_K_M · 16.1 tok/s · llama.cpp · Estimatedllama.cppEstimatedFirst-party M5 batch queued3.6 GB8kRecent frontier candidate in the current catalog. Q5_K_M is the highest practical quality here. 16.1 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 3.6 GB headroom is tight.
3Devstral Small 2 24B24B parameters248Q6_K 23.4 tok/s Fastest evidence path: Q6_K · 23.4 tok/s · llama.cpp · Estimatedllama.cppEstimatedFirst-party M5 batch queued3.9 GB10kRecent frontier candidate in the current catalog. Q6_K is the highest practical quality here. 23.4 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 3.9 GB headroom is tight.
4Qwen3.6-35B-A3B3B active / 35B total220Q4_K_M 32.0 tok/s Fastest evidence path: Q4_K_M · 32.0 tok/s · Ollama · Trusted referenceOllamaTrusted referenceFirst-party M5 batch queued4.2 GB16kRecent frontier candidate in the current catalog. Q4_K_M is the highest practical quality here. 32.0 tok/s benchmark-backed on Ollama wrapper on llama.cpp. 4.2 GB headroom leaves workable context margin.
5Gemma 4 26B-A4B3.8B active / 25.2B total2266bit 28.0 tok/s Fastest evidence path: 6bit · 28.0 tok/s · Ollama · EstimatedOllamaEstimatedFirst-party M5 batch queued4.0 GB10kRecent model release in the current catalog. 6bit is the highest practical quality here. 28.0 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 4.0 GB headroom leaves workable context margin.
6Gemma 4 E4B8B parameters2218bit 78.0 tok/s Fastest evidence path: 8bit · 78.0 tok/s · MLX · EstimatedMLXEstimatedFirst-party M5 batch queued15.4 GB131kRecent model release in the current catalog. 8bit is the highest practical quality here. 78.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 15.4 GB headroom leaves workable context margin.
7Nemotron Cascade 2 30B-A3B3B active / 30B total2205bit 22.0 tok/s Fastest evidence path: 5bit · 22.0 tok/s · Ollama · EstimatedOllamaEstimatedFirst-party M5 batch queued5.6 GB49kRecent model release in the current catalog. 5bit is the highest practical quality here. 22.0 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 5.6 GB headroom leaves workable context margin.
8Qwen3.5-9B9B parameters2208bit 92.0 tok/s Fastest evidence path: 8bit · 92.0 tok/s · MLX · EstimatedMLXEstimatedFirst-party M5 batch queued14.1 GB94kRecent model release in the current catalog. 8bit is the highest practical quality here. 92.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 14.1 GB headroom leaves workable context margin.
9Qwen3.5-35B-A3B3B active / 35B total217Q4_K_M 52.0 tok/s Fastest evidence path: Q4_K_M · 52.0 tok/s · MLX · EstimatedMLXEstimatedFirst-party M5 batch queued4.2 GB16kRecent model release in the current catalog. Q4_K_M is the highest practical quality here. 52.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 4.2 GB headroom leaves workable context margin.
10Ministral 3 14B14B parameters2178bit 40.0 tok/s Fastest evidence path: 8bit · 40.0 tok/s · Ollama · EstimatedOllamaEstimatedFirst-party M5 batch queued9.2 GB45kRecent model release in the current catalog. 8bit is the highest practical quality here. 40.0 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 9.2 GB headroom leaves workable context margin.
11Magistral Small24B parameters216Q6_K Measure it Best availableFit-firstFirst-party M5 batch queued3.9 GB10kQ6_K is the highest practical quality here. Speed still needs direct speed coverage. 3.9 GB headroom is tight.
12Nemotron-3-Nano-30B-A3B3.5B active / 30B total2165bit 43.7 tok/s Fastest evidence path: 5bit · 43.7 tok/s · llama.cpp · Estimatedllama.cppEstimatedFirst-party M5 batch queued5.6 GB49kRecent model release in the current catalog. 5bit is the highest practical quality here. 43.7 tok/s estimated from nearby benchmark coverage, with llama.cpp backend as the best runtime hint. 5.6 GB headroom leaves workable context margin.
13Ministral 3 8B8B parameters2148bit 72.0 tok/s Fastest evidence path: 8bit · 72.0 tok/s · Ollama · EstimatedOllamaEstimatedFirst-party M5 batch queued15.0 GB96kRecent model release in the current catalog. 8bit is the highest practical quality here. 72.0 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 15.0 GB headroom leaves workable context margin.
14gpt-oss 20B3.6B active / 21B total194Q6_K Measure it MLXFit-firstFirst-party M5 batch queued7.1 GB84kQ6_K is the highest practical quality here. Speed still needs direct speed coverage. 7.1 GB headroom leaves workable context margin.
15Gemma 4 E2B5.1B parameters1658bit 95.0 tok/s Fastest evidence path: 8bit · 95.0 tok/s · Ollama · EstimatedOllamaEstimatedFirst-party M5 batch queued18.5 GB131kRecent model release in the current catalog. 8bit is the highest practical quality here. 95.0 tok/s estimated from nearby benchmark coverage, with Ollama wrapper on llama.cpp as the best runtime hint. 18.5 GB headroom leaves workable context margin.
16Qwen3.5-4B4B parameters1618bit 148.0 tok/s Fastest evidence path: 8bit · 148.0 tok/s · MLX · EstimatedMLXEstimatedFirst-party M5 batch queued18.8 GB133kRecent model release in the current catalog. 8bit is the highest practical quality here. 148.0 tok/s estimated from nearby benchmark coverage, with MLX backend as the best runtime hint. 18.8 GB headroom leaves workable context margin.
24GBUnified memory
$1,499MSRP
macbook_airForm factor
M4Chip