Canonical Rankings

Best Macs for this model

Devstral Small 1.1 ranked across the Mac lineup at the best practical quantization, using the best available runtime evidence. Historical baseline selected; model picker is focused on current-market choices.

Model

Quantization

Sort

Runtime

29 ranked MacsUse the strongest current runtime evidence for each row.27 other historical models hiddenBaselinesStatic paths cover only canonical model pages; sort and quantization stay as query state.

Historical baseline selected: Devstral Small 1.1. Default model choices remain current-market; other historical models stay hidden.

Rank	Mac	Score	Quant	Tok/s	Runtime	Fits	Headroom	Context	Evidence	Price	Why it ranks here
1	Mac Studio M3 Ultra 256GB	430	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	231.9 GB	131k	Estimated	$7,499	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 231.9 GB headroom remains at this quantization.
2	Mac Pro M2 Ultra 192GB	366	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	167.9 GB	131k	Estimated	$6,999	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 167.9 GB headroom remains at this quantization.
3	Mac Studio M4 Max 128GB	302	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	103.9 GB	131k	Estimated	$4,499	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 103.9 GB headroom remains at this quantization.
4	MacBook Pro M5 Max 128GB 16-inch	302	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	103.9 GB	131k	Estimated	$5,399	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 103.9 GB headroom remains at this quantization.
5	MacBook Pro M4 Max 128GB 16-inch	302	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	103.9 GB	131k	Estimated	$5,999	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 103.9 GB headroom remains at this quantization.
6	Mac Studio M3 Ultra 96GB	270	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	71.9 GB	131k	Estimated	$3,999	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 71.9 GB headroom remains at this quantization.
7	Mac Studio M4 Max 64GB	238	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	39.9 GB	131k	Estimated	$2,999	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 39.9 GB headroom remains at this quantization.
8	MacBook Pro M4 Max 64GB 16-inch	238	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	39.9 GB	131k	Estimated	$4,499	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 39.9 GB headroom remains at this quantization.
9	Mac Studio M4 Max 48GB	222	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	23.9 GB	118k	Estimated	$2,499	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
10	MacBook Pro M4 Max 48GB 14-inch	222	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	23.9 GB	118k	Estimated	$3,499	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
11	MacBook Pro M4 Max 48GB 16-inch	222	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	23.9 GB	118k	Estimated	$3,999	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
12	Mac Studio M4 Max 36GB	210	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	11.9 GB	51k	Estimated	$1,999	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 11.9 GB headroom remains at this quantization.
13	MacBook Pro M4 Max 36GB 14-inch	210	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	11.9 GB	51k	Estimated	$2,999	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 11.9 GB headroom remains at this quantization.
14	MacBook Pro M4 Max 36GB 16-inch	210	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	11.9 GB	51k	Estimated	$3,499	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 11.9 GB headroom remains at this quantization.
15	Mac Mini M4 32GB	206	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	7.9 GB	28k	Estimated	$799	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 7.9 GB headroom remains at this quantization.
16	MacBook Air M4 32GB 13-inch	206	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	7.9 GB	28k	Estimated	$1,499	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 7.9 GB headroom remains at this quantization.
17	MacBook Air M4 32GB 15-inch	206	8bit	33.0 tok/s Fastest evidence path: 8bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	7.9 GB	28k	Estimated	$1,699	8bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 7.9 GB headroom remains at this quantization.
18	Mac Mini M4 24GB	196	Q6_K	33.0 tok/s Fastest evidence path: Q6_K · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	3.9 GB	10k	Estimated	$599	Q6_K is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
19	MacBook Air M4 24GB 13-inch	196	Q6_K	33.0 tok/s Fastest evidence path: Q6_K · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	3.9 GB	10k	Estimated	$1,299	Q6_K is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
20	Mac Mini M4 Pro 24GB	196	Q6_K	33.0 tok/s Fastest evidence path: Q6_K · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	3.9 GB	10k	Estimated	$1,399	Q6_K is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
21	MacBook Air M4 24GB 15-inch	196	Q6_K	33.0 tok/s Fastest evidence path: Q6_K · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	3.9 GB	10k	Estimated	$1,499	Q6_K is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
22	MacBook Pro M4 Pro 24GB 14-inch	196	Q6_K	33.0 tok/s Fastest evidence path: Q6_K · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	3.9 GB	10k	Estimated	$1,999	Q6_K is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
23	MacBook Pro M4 Pro 24GB 16-inch	196	Q6_K	33.0 tok/s Fastest evidence path: Q6_K · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	3.9 GB	10k	Estimated	$2,499	Q6_K is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 3.9 GB headroom remains at this quantization.
24	Mac Mini M4 16GB	189	q4.1bit	33.0 tok/s Fastest evidence path: q4.1bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	2.8 GB	11k	Estimated	$499	q4.1bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 2.8 GB headroom remains at this quantization.
25	MacBook Air M4 16GB 13-inch	189	q4.1bit	33.0 tok/s Fastest evidence path: q4.1bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	2.8 GB	11k	Estimated	$1,099	q4.1bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 2.8 GB headroom remains at this quantization.
26	MacBook Air M4 16GB 15-inch	189	q4.1bit	33.0 tok/s Fastest evidence path: q4.1bit · 33.0 tok/s · LM Studio · Estimated	LM Studio	Fits	2.8 GB	11k	Estimated	$1,299	q4.1bit is the current best practical quantization. 33.0 tok/s is estimated from nearby benchmark coverage. 2.8 GB headroom remains at this quantization.
27	Mac Mini M4 Pro 48GB	141	8bit	12.9 tok/s Fastest evidence path: 8bit · 12.9 tok/s · LM Studio · Estimated	LM Studio	Fits	23.9 GB	118k	Estimated	$1,599	8bit is the current best practical quantization. 12.9 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
28	MacBook Pro M4 Pro 48GB 14-inch	141	8bit	12.9 tok/s Fastest evidence path: 8bit · 12.9 tok/s · LM Studio · Estimated	LM Studio	Fits	23.9 GB	118k	Estimated	$2,499	8bit is the current best practical quantization. 12.9 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.
29	MacBook Pro M4 Pro 48GB 16-inch	141	8bit	12.9 tok/s Fastest evidence path: 8bit · 12.9 tok/s · LM Studio · Estimated	LM Studio	Fits	23.9 GB	118k	Estimated	$2,999	8bit is the current best practical quantization. 12.9 tok/s is estimated from nearby benchmark coverage. 23.9 GB headroom remains at this quantization.

Devstral Small 1.1 — ranking first, raw rows below

Start with the ranked Mac table above. Use the rest of this page to inspect raw Apple Silicon coverage and model metadata.

Quantizations observed: 4bit, 6bit

4Benchmark rows

4Chip tiers covered

43.0Fastest avg tok/s (M3 Ultra (512 GB))

18.51 GBMinimum RAM observed

Quick take

Fastest published result is 43.0 tok/s on M3 Ultra (512 GB) at 4bit. Smallest published fit is 18.5 GB on M4 Pro (48 GB). Longest published context on this page is 131k. Published runtimes include LM Studio. Start with Rankings for the decision, then use the raw rows below to audit the evidence.

Based on 4 external benchmarks; no lab runs yet.

Published runtimes: LM Studio.

Need the best Mac for this model? Use Buy Need a setup-first answer? Use Run Checking whether it fits? Use Fit Browse Macs by exact hardware Need the full audit trail? Use Bench Comparing against rented GPUs? Use AI Datacenter Index

Catalog record

24BTotal params

DenseActive params

131,072Context window

2025-07-10Release date

This is a reference-only model record. It remains useful for historical benchmarks, migration checks, and audit context, but it is excluded from current frontier packs.

What this model is, and what Apple Silicon users are actually seeing

Official model cards tell you what the model is for and which software stacks it targets. Field reality below shows how much Apple Silicon evidence we have so far.

Official brief

Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents.

Official source · Raw model card

agentscoding

Runtime support mentioned

llama.cppOllamaLM StudiovLLMTransformersOpenHandsCline

Official specs

Total parameters: 24B.
Context: 128k tokens.
Modalities: Text-only.
License: Apache 2.0.

Official takeaways

Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents.
lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use.
Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes.
Context Window: A 128k context window.

Deployment notes

We recommend to use Devstral with the OpenHands scaffold. You can use it either through our API or by running locally.
Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with Devstral Small 1.1.
In the case of the tutorial we spineed up a vLLM server running the command:

Apple Silicon note: Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents. lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use. Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. Context Window: A 128k context window. Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size.

Official model cards describe intent, capabilities, and supported stacks. They do not prove Apple Silicon speed by themselves.

Field reality on Apple Silicon

Devstral Small 1.1: 3 Apple Silicon field reports; best reported generation ~33 tok/s; reported RAM use ~18.51GB; seen on MacBook Pro M4 MAX 128GB, MacBook Pro M4 PRO 48GB, MacBook Air M2 24GB; via MLX.

4Benchmark rows

3Field reports

5Practitioner signals

Sparse BenchmarksEvidence status

What practitioners keep saying

The post reports mistralai/devstral-small-2507 at about 12.88 tok/s on an M4 Pro 48GB MacBook Pro in LM Studio with a 6bit MLX quant, about 5.91 seconds TTFT, and about 18.51GB RAM used.
That result sharpens the buying curve between the 24GB M2 Air baseline and the 128GB M4 Max ceiling, showing a realistic middle tier for longer-context local coding.
The operator reports Devstral Small 2507 DWQ at about 6 tps on an M2 MacBook Air with 24GB unified memory, max context window, LM Studio, and the MLX backend.

Apple Silicon field sources

r/LocalLLaMA
2025-09-18 · MacBook Pro M4 Pro 48GB · LM Studio 6bit MLX
Devstral Small 1.1 has a concrete mid-tier Apple Silicon result now: it stays usable on an M4 Pro 48GB MacBook Pro without requiring flagship-class memory.
r/LocalLLaMA
2025-07-17 · M2 MacBook Air 24GB, MacBook Pro M4 Max 128GB · LM Studio (MLX)
Devstral Small 1.1 is not just a flagship-Mac story: it still runs on a 24GB M2 MacBook Air, but at a clearly slower baseline that makes the memory-tier tradeoff visible.
r/LocalLLaMA
2025-07-15 · MacBook Pro M4 Max 128GB · Cline
Devstral Small 1.1 looks fast enough on M4 Max-class Macs, but prompt-heavy coding interfaces can still overwhelm it.
r/MistralAI
2025-07-10 · 32GB RAM MacBook · vLLM, Transformers, Ollama, or LM Studio
Devstral Small 1.1 is explicitly being positioned as a locally deployable Apple Silicon coding model rather than a server-only agent release.

Runtime mentions in the field

ClineLM StudioMLXOllama

Hardware mentioned in reports

24GB32GB48GB128GBM4M4 ProMacMacBook

What would improve confidence

Reproduce Field Performance Signal
Upgrade To First Party Measurement

Current published coverage

Published chip coverage includes M3 Ultra (512 GB), M4 Max (128 GB), M4 Pro (48 GB), M2 (24 GB). Fastest published row is 43.0 tok/s on M3 Ultra (512 GB) at 4bit. Lowest published RAM requirement is 18.5 GB on M4 Pro (48 GB). Catalog context window is 131k.

M3 Ultra (512 GB)M4 Max (128 GB)M4 Pro (48 GB)M2 (24 GB)

Related Devstral Small 1.1 models with published pages: Devstral Small 2 24B

Raw benchmark rows for Devstral Small 1.1

Rows stay below the ranking because this page is answer-first. Use them to inspect exact chips, quantizations, runtimes, and sources.

Chip	Quant	RAM req.	Context	Avg tok/s	Prompt tok/s	Runtime	Source
M3 Ultra (512 GB)	4bit	—	—	43.0 tok/s	—	LM Studio	ref
M4 Max (128 GB)	4bit	—	—	33.0 tok/s	—	LM Studio	ref
M4 Pro (48 GB)	6bit	18.5 GB	131k	12.9 tok/s	—	LM Studio	ref
M2 (24 GB)	4bit	—	—	6.0 tok/s	—	LM Studio	ref

Best Macs for Devstral Small 1.1

Ordered by fastest published tok/s on the chip family in each Mac. Click through for the full machine page.

Mac Studio M3 Ultra 96GB — 43.0 tok/s Mac Studio M3 Ultra 256GB — 43.0 tok/s MacBook Pro M4 Max 36GB 14-inch — 33.0 tok/s MacBook Pro M4 Max 48GB 14-inch — 33.0 tok/s MacBook Pro M4 Max 36GB 16-inch — 33.0 tok/s MacBook Pro M4 Max 48GB 16-inch — 33.0 tok/s

Chips with published results for Devstral Small 1.1

M3 Ultra (512 GB)M4 Max (128 GB)M4 Pro (48 GB)M2 (24 GB)

Data

benchmarks.json — full dataset · models.json — model summaries · benchmarks.csv — CSV export

See all models →