← Canonical rankings
Canonical Rankings

Best Macs for this model

Devstral Small 1.1 ranked across the Mac lineup at the best practical quantization, using the best available runtime evidence. Historical baseline selected; model picker is focused on current-market choices.

29 ranked MacsUse the strongest current runtime evidence for each row.27 other historical models hiddenStatic paths cover only canonical model pages; sort and quantization stay as query state.

Historical baseline selected: Devstral Small 1.1. Default model choices remain current-market; other historical models stay hidden.

RankMacScoreQuantTok/sRuntimeFitsHeadroomContextEvidencePriceWhy it ranks here
1Mac Studio M3 Ultra 256GB4308bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits231.9 GB131kEstimated$7,4998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 231.9 GB headroom remains at this quantization.
2Mac Pro M2 Ultra 192GB3668bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits167.9 GB131kEstimated$6,9998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 167.9 GB headroom remains at this quantization.
3Mac Studio M4 Max 128GB3028bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits103.9 GB131kEstimated$4,4998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 103.9 GB headroom remains at this quantization.
4MacBook Pro M5 Max 128GB 16-inch3028bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits103.9 GB131kEstimated$5,3998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 103.9 GB headroom remains at this quantization.
5MacBook Pro M4 Max 128GB 16-inch3028bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits103.9 GB131kEstimated$5,9998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 103.9 GB headroom remains at this quantization.
6Mac Studio M3 Ultra 96GB2708bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits71.9 GB131kEstimated$3,9998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 71.9 GB headroom remains at this quantization.
7Mac Studio M4 Max 64GB2388bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits39.9 GB131kEstimated$2,9998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 39.9 GB headroom remains at this quantization.
8MacBook Pro M4 Max 64GB 16-inch2388bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits39.9 GB131kEstimated$4,4998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 39.9 GB headroom remains at this quantization.
9Mac Studio M4 Max 48GB2228bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits23.9 GB118kEstimated$2,4998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
10MacBook Pro M4 Max 48GB 14-inch2228bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits23.9 GB118kEstimated$3,4998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
11MacBook Pro M4 Max 48GB 16-inch2228bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits23.9 GB118kEstimated$3,9998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
12Mac Studio M4 Max 36GB2108bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits11.9 GB51kEstimated$1,9998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 11.9 GB headroom remains at this quantization.
13MacBook Pro M4 Max 36GB 14-inch2108bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits11.9 GB51kEstimated$2,9998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 11.9 GB headroom remains at this quantization.
14MacBook Pro M4 Max 36GB 16-inch2108bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits11.9 GB51kEstimated$3,4998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 11.9 GB headroom remains at this quantization.
15Mac Mini M4 32GB2068bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits7.9 GB28kEstimated$7998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 7.9 GB headroom remains at this quantization.
16MacBook Air M4 32GB 13-inch2068bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits7.9 GB28kEstimated$1,4998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 7.9 GB headroom remains at this quantization.
17MacBook Air M4 32GB 15-inch2068bit 33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits7.9 GB28kEstimated$1,6998bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 7.9 GB headroom remains at this quantization.
18Mac Mini M4 24GB196Q6_K 33.0 tok/s Fastest evidence path: Q6_K · 33.0 tok/s · LM Studio · EstimatedLM StudioFits3.9 GB10kEstimated$599Q6_K is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
19MacBook Air M4 24GB 13-inch196Q6_K 33.0 tok/s Fastest evidence path: Q6_K · 33.0 tok/s · LM Studio · EstimatedLM StudioFits3.9 GB10kEstimated$1,299Q6_K is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
20Mac Mini M4 Pro 24GB196Q6_K 33.0 tok/s Fastest evidence path: Q6_K · 33.0 tok/s · LM Studio · EstimatedLM StudioFits3.9 GB10kEstimated$1,399Q6_K is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
21MacBook Air M4 24GB 15-inch196Q6_K 33.0 tok/s Fastest evidence path: Q6_K · 33.0 tok/s · LM Studio · EstimatedLM StudioFits3.9 GB10kEstimated$1,499Q6_K is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
22MacBook Pro M4 Pro 24GB 14-inch196Q6_K 33.0 tok/s Fastest evidence path: Q6_K · 33.0 tok/s · LM Studio · EstimatedLM StudioFits3.9 GB10kEstimated$1,999Q6_K is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
23MacBook Pro M4 Pro 24GB 16-inch196Q6_K 33.0 tok/s Fastest evidence path: Q6_K · 33.0 tok/s · LM Studio · EstimatedLM StudioFits3.9 GB10kEstimated$2,499Q6_K is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
24Mac Mini M4 16GB189q4.1bit 33.0 tok/s Fastest evidence path: q4.1bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits2.8 GB11kEstimated$499q4.1bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 2.8 GB headroom remains at this quantization.
25MacBook Air M4 16GB 13-inch189q4.1bit 33.0 tok/s Fastest evidence path: q4.1bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits2.8 GB11kEstimated$1,099q4.1bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 2.8 GB headroom remains at this quantization.
26MacBook Air M4 16GB 15-inch189q4.1bit 33.0 tok/s Fastest evidence path: q4.1bit · 33.0 tok/s · LM Studio · EstimatedLM StudioFits2.8 GB11kEstimated$1,299q4.1bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 2.8 GB headroom remains at this quantization.
27Mac Mini M4 Pro 48GB1418bit 12.9 tok/s Fastest evidence path: 8bit · 12.9 tok/s · LM Studio · EstimatedLM StudioFits23.9 GB118kEstimated$1,5998bit is the current best practical quantization. 12.9 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
28MacBook Pro M4 Pro 48GB 14-inch1418bit 12.9 tok/s Fastest evidence path: 8bit · 12.9 tok/s · LM Studio · EstimatedLM StudioFits23.9 GB118kEstimated$2,4998bit is the current best practical quantization. 12.9 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
29MacBook Pro M4 Pro 48GB 16-inch1418bit 12.9 tok/s Fastest evidence path: 8bit · 12.9 tok/s · LM Studio · EstimatedLM StudioFits23.9 GB118kEstimated$2,9998bit is the current best practical quantization. 12.9 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.

Devstral Small 1.1 — ranking first, raw rows below

Start with the ranked Mac table above. Use the rest of this page to inspect raw Apple Silicon coverage and model metadata.

Quantizations observed: 4bit, 6bit

4Benchmark rows
4Chip tiers covered
43.0Fastest avg tok/s (M3 Ultra (512 GB))
18.51 GBMinimum RAM observed

Fastest published result is 43.0 tok/s on M3 Ultra (512 GB) at 4bit. Smallest published fit is 18.5 GB on M4 Pro (48 GB). Longest published context on this page is 131k. Published runtimes include LM Studio. Start with Rankings for the decision, then use the raw rows below to audit the evidence.

Based on 4 external benchmarks; no lab runs yet.

Published runtimes: LM Studio.

24BTotal params
DenseActive params
131,072Context window
2025-07-10Release date

This is a reference-only model record. It remains useful for historical benchmarks, migration checks, and audit context, but it is excluded from current frontier packs.

What this model is, and what Apple Silicon users are actually seeing

Official model cards tell you what the model is for and which software stacks it targets. Field reality below shows how much Apple Silicon evidence we have so far.

Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents.

Official source  ·  Raw model card

agentscoding

Runtime support mentioned

llama.cppOllamaLM StudiovLLMTransformersOpenHandsCline

Official specs

  • Total parameters: 24B.
  • Context: 128k tokens.
  • Modalities: Text-only.
  • License: Apache 2.0.

Official takeaways

  • Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents.
  • lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use.
  • Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes.
  • Context Window: A 128k context window.

Deployment notes

  • We recommend to use Devstral with the OpenHands scaffold. You can use it either through our API or by running locally.
  • Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with Devstral Small 1.1.
  • In the case of the tutorial we spineed up a vLLM server running the command:

Apple Silicon note: Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents. lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use. Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. Context Window: A 128k context window. Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size.

Official model cards describe intent, capabilities, and supported stacks. They do not prove Apple Silicon speed by themselves.

Devstral Small 1.1: 3 Apple Silicon field reports; best reported generation ~33 tok/s; reported RAM use ~18.51GB; seen on MacBook Pro M4 MAX 128GB, MacBook Pro M4 PRO 48GB, MacBook Air M2 24GB; via MLX.

4Benchmark rows
3Field reports
5Practitioner signals
Sparse BenchmarksEvidence status

What practitioners keep saying

  • The post reports mistralai/devstral-small-2507 at about 12.88 tok/s on an M4 Pro 48GB MacBook Pro in LM Studio with a 6bit MLX quant, about 5.91 seconds TTFT, and about 18.51GB RAM used.
  • That result sharpens the buying curve between the 24GB M2 Air baseline and the 128GB M4 Max ceiling, showing a realistic middle tier for longer-context local coding.
  • The operator reports Devstral Small 2507 DWQ at about 6 tps on an M2 MacBook Air with 24GB unified memory, max context window, LM Studio, and the MLX backend.

Apple Silicon field sources

  • r/LocalLLaMA

    2025-09-18 · MacBook Pro M4 Pro 48GB · LM Studio 6bit MLX

    Devstral Small 1.1 has a concrete mid-tier Apple Silicon result now: it stays usable on an M4 Pro 48GB MacBook Pro without requiring flagship-class memory.

  • r/LocalLLaMA

    2025-07-17 · M2 MacBook Air 24GB, MacBook Pro M4 Max 128GB · LM Studio (MLX)

    Devstral Small 1.1 is not just a flagship-Mac story: it still runs on a 24GB M2 MacBook Air, but at a clearly slower baseline that makes the memory-tier tradeoff visible.

  • r/LocalLLaMA

    2025-07-15 · MacBook Pro M4 Max 128GB · Cline

    Devstral Small 1.1 looks fast enough on M4 Max-class Macs, but prompt-heavy coding interfaces can still overwhelm it.

  • r/MistralAI

    2025-07-10 · 32GB RAM MacBook · vLLM, Transformers, Ollama, or LM Studio

    Devstral Small 1.1 is explicitly being positioned as a locally deployable Apple Silicon coding model rather than a server-only agent release.

Runtime mentions in the field

ClineLM StudioMLXOllama

Hardware mentioned in reports

24GB32GB48GB128GBM4M4 ProMacMacBook

What would improve confidence

  • Reproduce Field Performance Signal
  • Upgrade To First Party Measurement

Published chip coverage includes M3 Ultra (512 GB), M4 Max (128 GB), M4 Pro (48 GB), M2 (24 GB). Fastest published row is 43.0 tok/s on M3 Ultra (512 GB) at 4bit. Lowest published RAM requirement is 18.5 GB on M4 Pro (48 GB). Catalog context window is 131k.

Related Devstral Small 1.1 models with published pages: Devstral Small 2 24B

Raw benchmark rows for Devstral Small 1.1

Rows stay below the ranking because this page is answer-first. Use them to inspect exact chips, quantizations, runtimes, and sources.

ChipQuantRAM req.ContextAvg tok/sPrompt tok/sRuntimeSource
M3 Ultra (512 GB)4bit43.0 tok/sLM Studioref
M4 Max (128 GB)4bit33.0 tok/sLM Studioref
M4 Pro (48 GB)6bit18.5 GB131k12.9 tok/sLM Studioref
M2 (24 GB)4bit6.0 tok/sLM Studioref

Ordered by fastest published tok/s on the chip family in each Mac. Click through for the full machine page.

benchmarks.json — full dataset  ·  models.json — model summaries  ·  benchmarks.csv — CSV export

See all models →