← Canonical rankings
Canonical Rankings

Best Macs for this model

Devstral Small 2 24B ranked across the Mac lineup at the best practical quantization, using the best available runtime evidence. Model picker is focused on current-market choices.

29 ranked MacsUse the strongest current runtime evidence for each row.28 historical models hiddenStatic paths cover only canonical model pages; sort and quantization stay as query state.
RankMacScoreQuantTok/sRuntimeFitsHeadroomContextEvidencePriceWhy it ranks here
1Mac Studio M3 Ultra 256GB4868bit 47.0 tok/s Fastest evidence path: 8bit · 47.0 tok/s · MLX · EstimatedMLXFits231.9 GB262kEstimated$7,4998bit is the current best practical quantization. 47.0 tok/s is estimated from nearby benchmark coverage. 231.9 GB headroom remains at this quantization.
2Mac Pro M2 Ultra 192GB3278bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits167.9 GB262kEstimated$6,9998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 167.9 GB headroom remains at this quantization.
3Mac Studio M4 Max 128GB2638bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits103.9 GB262kEstimated$4,4998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 103.9 GB headroom remains at this quantization.
4MacBook Pro M5 Max 128GB 16-inch2638bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits103.9 GB262kEstimated$5,3998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 103.9 GB headroom remains at this quantization.
5MacBook Pro M4 Max 128GB 16-inch2638bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits103.9 GB262kEstimated$5,9998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 103.9 GB headroom remains at this quantization.
6Mac Studio M3 Ultra 96GB2318bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits71.9 GB262kEstimated$3,9998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 71.9 GB headroom remains at this quantization.
7Mac Studio M4 Max 64GB1998bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits39.9 GB207kEstimated$2,9998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 39.9 GB headroom remains at this quantization.
8MacBook Pro M4 Max 64GB 16-inch1998bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits39.9 GB207kEstimated$4,4998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 39.9 GB headroom remains at this quantization.
9Mac Mini M4 Pro 48GB1838bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits23.9 GB118kEstimated$1,5998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
10MacBook Pro M4 Pro 48GB 14-inch1838bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits23.9 GB118kEstimated$2,4998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
11Mac Studio M4 Max 48GB1838bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits23.9 GB118kEstimated$2,4998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
12MacBook Pro M4 Pro 48GB 16-inch1838bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits23.9 GB118kEstimated$2,9998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
13MacBook Pro M4 Max 48GB 14-inch1838bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits23.9 GB118kEstimated$3,4998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
14MacBook Pro M4 Max 48GB 16-inch1838bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits23.9 GB118kEstimated$3,9998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
15Mac Studio M4 Max 36GB1718bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits11.9 GB51kEstimated$1,9998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 11.9 GB headroom remains at this quantization.
16MacBook Pro M4 Max 36GB 14-inch1718bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits11.9 GB51kEstimated$2,9998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 11.9 GB headroom remains at this quantization.
17MacBook Pro M4 Max 36GB 16-inch1718bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits11.9 GB51kEstimated$3,4998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 11.9 GB headroom remains at this quantization.
18Mac Mini M4 32GB1678bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits7.9 GB28kEstimated$7998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 7.9 GB headroom remains at this quantization.
19MacBook Air M4 32GB 13-inch1678bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits7.9 GB28kEstimated$1,4998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 7.9 GB headroom remains at this quantization.
20MacBook Air M4 32GB 15-inch1678bit 23.4 tok/s Fastest evidence path: 8bit · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits7.9 GB28kEstimated$1,6998bit is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 7.9 GB headroom remains at this quantization.
21Mac Mini M4 24GB157Q6_K 23.4 tok/s Fastest evidence path: Q6_K · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits3.9 GB10kEstimated$599Q6_K is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
22MacBook Air M4 24GB 13-inch157Q6_K 23.4 tok/s Fastest evidence path: Q6_K · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits3.9 GB10kEstimated$1,299Q6_K is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
23Mac Mini M4 Pro 24GB157Q6_K 23.4 tok/s Fastest evidence path: Q6_K · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits3.9 GB10kEstimated$1,399Q6_K is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
24MacBook Air M4 24GB 15-inch157Q6_K 23.4 tok/s Fastest evidence path: Q6_K · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits3.9 GB10kEstimated$1,499Q6_K is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
25MacBook Pro M4 Pro 24GB 14-inch157Q6_K 23.4 tok/s Fastest evidence path: Q6_K · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits3.9 GB10kEstimated$1,999Q6_K is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
26MacBook Pro M4 Pro 24GB 16-inch157Q6_K 23.4 tok/s Fastest evidence path: Q6_K · 23.4 tok/s · llama.cpp · Estimatedllama.cppFits3.9 GB10kEstimated$2,499Q6_K is the current best practical quantization. 23.4 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
27Mac Mini M4 16GB57q4.1bit 0.1 tok/s Fastest evidence path: Q4_0 · 3.4 tok/s · llama.cpp · Community rowllama.cppFits2.8 GB11kEstimated$499q4.1bit is the current best practical quantization. 0.1 tok/s is estimated from nearby benchmark coverage. 2.8 GB headroom remains at this quantization.
28MacBook Air M4 16GB 13-inch57q4.1bit 0.1 tok/s Fastest evidence path: Q4_0 · 3.4 tok/s · llama.cpp · Community rowllama.cppFits2.8 GB11kEstimated$1,099q4.1bit is the current best practical quantization. 0.1 tok/s is estimated from nearby benchmark coverage. 2.8 GB headroom remains at this quantization.
29MacBook Air M4 16GB 15-inch57q4.1bit 0.1 tok/s Fastest evidence path: Q4_0 · 3.4 tok/s · llama.cpp · Community rowllama.cppFits2.8 GB11kEstimated$1,299q4.1bit is the current best practical quantization. 0.1 tok/s is estimated from nearby benchmark coverage. 2.8 GB headroom remains at this quantization.

Devstral Small 2 24B — ranking first, raw rows below

Start with the ranked Mac table above. Use the rest of this page to inspect raw Apple Silicon coverage and model metadata.

Quantizations observed: 4bit, Q4_K - Medium, Q8, 8bit, Q4_0, Q4_1

8Benchmark rows
3Chip tiers covered
47.0Fastest avg tok/s (M3 Ultra (256 GB))
13.4 GBMinimum RAM observed

Fastest published result is 47.0 tok/s on M3 Ultra (256 GB) at 4bit. Smallest published fit is 13.4 GB on M3 Ultra (256 GB). Published runtimes include llama.cpp, MLX. Start with Rankings for the decision, then use the raw rows below to audit the evidence.

Based on 8 external benchmarks; no lab runs yet.

Published runtimes: llama.cpp, MLX.

24BTotal params
DenseActive params
262,144Context window
2025-11-28Release date

What this model is, and what Apple Silicon users are actually seeing

Official model cards tell you what the model is for and which software stacks it targets. Field reality below shows how much Apple Silicon evidence we have so far.

The Devstral Small 2 Instruct model offers the following capabilities: Agentic Coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents.

Official source  ·  Raw model card

agentscodingvisual-understanding

Runtime support mentioned

llama.cppOllamaLM StudiovLLMSGLangTransformersOpenHandsClaude CodeClineKilo CodeMistral Vibe

Official specs

  • Total parameters: 24B.
  • Context: 256k tokens.
  • Modalities: Text and image input, text output.
  • License: Apache 2.0.

Official takeaways

  • Agentic Coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents.
  • Lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use.
  • Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
  • Context Window: A 256k context window.

Deployment notes

  • You can use Devstral 2 either through our API or by running locally.
  • You can now also run Devstral using these (alphabetical ordered) frameworks: llama.cpp: To use community ones such as Unsloth's or Bartowski's make sure to use changes from this PR.
  • If you notice subpar performance with local serving, please submit issues to the relevant framework so that it can be fixed and in the meantime we advise to use the Mistral AI API.

Apple Silicon note: The Devstral Small 2 Instruct model offers the following capabilities: Agentic Coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents. Lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use. Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes. Context Window: A 256k context window.

Official model cards describe intent, capabilities, and supported stacks. They do not prove Apple Silicon speed by themselves.

Devstral Small 2 24B: 7 Apple Silicon field reports; best reported generation ~47 tok/s; seen on Mac Studio M3 ULTRA 256GB, M1 ULTRA 64GB, Mac Mini M4 16GB; via MLX, llama.cpp.

8Benchmark rows
7Field reports
8Practitioner signals
Sparse BenchmarksEvidence status

What practitioners keep saying

  • The benchmark summary flags Devstral Small 2 24B Q4_1 at about 17.5 seconds to first token and only about 0.054 tok/s, which is well below a usable interactive threshold.
  • The same sweep lists Devstral Small 2 24B Q4_K_M at about 65.3 seconds to first token and only about 0.020 tok/s, reinforcing that 16GB buyers should treat this model as non-viable rather than merely slow.
  • The report is based on actual code-assistance work rather than benchmark screenshots.

Apple Silicon field sources

  • Manojb Hugging Face model cards

    2026-03-26 · Mac Mini M4 16GB · llama.cpp

    A broad 16GB Mac Mini M4 sweep says Devstral Small 2 24B is effectively a trap tier on constrained Apple Silicon: it may load in aggressive GGUFs, but practical latency collapses.

  • r/LocalLLaMA

    2026-03-19 · 16GB GPU local workstation

    Practitioners are calling Devstral Small 2 materially better for real coding than the popular consensus suggested.

  • r/LocalLLaMA

    2026-03-04 · Mac Studio M3 Ultra 256GB · MLX

    A standardized M3 Ultra eval sharpens the Devstral Small 2 story on Mac: strong code model, weak tool-calling defaults.

  • r/LocalLLaMA

    2026-02-02 · Local agentic coding workstation · OpenCode

    Practitioner discussion is framing Devstral Small 2 as more time-efficient for local agentic coding than GLM-4.7-Flash despite lower raw tok/s.

  • r/LocalLLaMA

    2025-12-13 · M1 Ultra 64GB · LM Studio (llama.cpp), LM Studio (MLX)

    The same M1 Ultra macOS report measures Devstral Small 2 GGUF Q4_K_M through LM Studio's llama.cpp path at usable coding speed.

Runtime mentions in the field

llama.cppLM StudioMLXOpenCode

Hardware mentioned in reports

16GB64GBM1 UltraM3 UltraM4MacMac MiniMac Studio

What would improve confidence

  • Reproduce Field Performance Signal
  • Upgrade To First Party Measurement

Published chip coverage includes M3 Ultra (256 GB), M1 Ultra (64 GB), M4 (16 GB). Fastest published row is 47.0 tok/s on M3 Ultra (256 GB) at 4bit. Lowest published RAM requirement is 13.4 GB on M3 Ultra (256 GB).

Related Devstral Small 2 models with published pages: Devstral Small 1.1

Standardized eval scorecards for Devstral Small 2 24B

These are fixed-machine model scorecards from a single Apple Silicon setup. They help explain whether a model is merely fast or actually good at tools, coding, reasoning, and general tasks. They do not replace the main Mac ranking above.

Mac Studio M3 Ultra 256GB · Avg 62%

17%Tools
90%Coding
70%Reasoning
70%General

Speed and memory

  • Long decode: 47.2 tok/s
  • Short decode: 22.5 tok/s
  • Cold TTFT: 0.291 s
  • Active RAM: 13.4 GB

Strong coding score, but tool calling is poor in this standardized setup.

vLLM-MLX SCORECARD.md  ·  discussion · 2026-03-04

Raw benchmark rows for Devstral Small 2 24B

Rows stay below the ranking because this page is answer-first. Use them to inspect exact chips, quantizations, runtimes, and sources.

ChipQuantRAM req.ContextAvg tok/sPrompt tok/sRuntimeSource
M3 Ultra (256 GB)4bit13.4 GB47.0 tok/sMLXref
M1 Ultra (64 GB)4bit29.7 tok/sMLXref
M1 Ultra (64 GB)Q4_K - Medium25.3 tok/sllama.cppref
M1 Ultra (64 GB)Q823.4 tok/sllama.cppref
M1 Ultra (64 GB)8bit22.3 tok/sMLXref
M4 (16 GB)Q4_03.4 tok/sllama.cppref
M4 (16 GB)Q4_10.1 tok/sllama.cppref
M4 (16 GB)Q4_K - Medium0.0 tok/sllama.cppref

Ordered by fastest published tok/s on the chip family in each Mac. Click through for the full machine page.

benchmarks.json — full dataset  ·  models.json — model summaries  ·  benchmarks.csv — CSV export

See all models →