Canonical Rankings

Best Macs for this model

GLM-5 ranked across the Mac lineup at the best practical quantization, using the best available runtime evidence. Historical baseline selected; model picker is focused on current-market choices.

Model

Quantization

Sort

Runtime

29 ranked MacsUse the strongest current runtime evidence for each row.27 other historical models hiddenBaselinesStatic paths cover only canonical model pages; sort and quantization stay as query state.

Historical baseline selected: GLM-5. Default model choices remain current-market; other historical models stay hidden.

Rank	Mac	Score	Quant	Tok/s	Runtime	Fits	Headroom	Context	Evidence	Price	Why it ranks here
1	Mac Studio M3 Ultra 256GB	120	IQ2_XS	13.2 tok/s Fastest evidence path: IQ2_XS · 13.2 tok/s · MLX · Estimated	MLX	Fits	43.3 GB	9k	Estimated	$7,499	IQ2_XS is the current best practical quantization. 13.2 tok/s is estimated from nearby benchmark coverage. 43.3 GB headroom remains at this quantization.
2	Mac Mini M4 16GB	0	F32	—	MLX	No	-2795.1 GB	—	Estimated	$499	GLM-5 does not fit on Mac Mini M4 16GB at the current practical quantization.
3	Mac Mini M4 24GB	0	F32	—	MLX	No	-2787.1 GB	—	Estimated	$599	GLM-5 does not fit on Mac Mini M4 24GB at the current practical quantization.
4	Mac Mini M4 32GB	0	F32	—	MLX	No	-2779.1 GB	—	Estimated	$799	GLM-5 does not fit on Mac Mini M4 32GB at the current practical quantization.
5	MacBook Air M4 16GB 13-inch	0	F32	—	MLX	No	-2795.1 GB	—	Estimated	$1,099	GLM-5 does not fit on MacBook Air M4 16GB 13-inch at the current practical quantization.
6	MacBook Air M4 24GB 13-inch	0	F32	—	MLX	No	-2787.1 GB	—	Estimated	$1,299	GLM-5 does not fit on MacBook Air M4 24GB 13-inch at the current practical quantization.
7	MacBook Air M4 16GB 15-inch	0	F32	—	MLX	No	-2795.1 GB	—	Estimated	$1,299	GLM-5 does not fit on MacBook Air M4 16GB 15-inch at the current practical quantization.
8	Mac Mini M4 Pro 24GB	0	F32	—	MLX	No	-2787.1 GB	—	Estimated	$1,399	GLM-5 does not fit on Mac Mini M4 Pro 24GB at the current practical quantization.
9	MacBook Air M4 32GB 13-inch	0	F32	—	MLX	No	-2779.1 GB	—	Estimated	$1,499	GLM-5 does not fit on MacBook Air M4 32GB 13-inch at the current practical quantization.
10	MacBook Air M4 24GB 15-inch	0	F32	—	MLX	No	-2787.1 GB	—	Estimated	$1,499	GLM-5 does not fit on MacBook Air M4 24GB 15-inch at the current practical quantization.
11	Mac Mini M4 Pro 48GB	0	F32	—	MLX	No	-2763.1 GB	—	Estimated	$1,599	GLM-5 does not fit on Mac Mini M4 Pro 48GB at the current practical quantization.
12	MacBook Air M4 32GB 15-inch	0	F32	—	MLX	No	-2779.1 GB	—	Estimated	$1,699	GLM-5 does not fit on MacBook Air M4 32GB 15-inch at the current practical quantization.
13	MacBook Pro M4 Pro 24GB 14-inch	0	F32	—	MLX	No	-2787.1 GB	—	Estimated	$1,999	GLM-5 does not fit on MacBook Pro M4 Pro 24GB 14-inch at the current practical quantization.
14	Mac Studio M4 Max 36GB	0	F32	—	MLX	No	-2775.1 GB	—	Estimated	$1,999	GLM-5 does not fit on Mac Studio M4 Max 36GB at the current practical quantization.
15	MacBook Pro M4 Pro 48GB 14-inch	0	F32	—	MLX	No	-2763.1 GB	—	Estimated	$2,499	GLM-5 does not fit on MacBook Pro M4 Pro 48GB 14-inch at the current practical quantization.
16	MacBook Pro M4 Pro 24GB 16-inch	0	F32	—	MLX	No	-2787.1 GB	—	Estimated	$2,499	GLM-5 does not fit on MacBook Pro M4 Pro 24GB 16-inch at the current practical quantization.
17	Mac Studio M4 Max 48GB	0	F32	—	MLX	No	-2763.1 GB	—	Estimated	$2,499	GLM-5 does not fit on Mac Studio M4 Max 48GB at the current practical quantization.
18	MacBook Pro M4 Max 36GB 14-inch	0	F32	—	MLX	No	-2775.1 GB	—	Estimated	$2,999	GLM-5 does not fit on MacBook Pro M4 Max 36GB 14-inch at the current practical quantization.
19	MacBook Pro M4 Pro 48GB 16-inch	0	F32	—	MLX	No	-2763.1 GB	—	Estimated	$2,999	GLM-5 does not fit on MacBook Pro M4 Pro 48GB 16-inch at the current practical quantization.
20	Mac Studio M4 Max 64GB	0	F32	—	MLX	No	-2747.1 GB	—	Estimated	$2,999	GLM-5 does not fit on Mac Studio M4 Max 64GB at the current practical quantization.
21	MacBook Pro M4 Max 48GB 14-inch	0	F32	—	MLX	No	-2763.1 GB	—	Estimated	$3,499	GLM-5 does not fit on MacBook Pro M4 Max 48GB 14-inch at the current practical quantization.
22	MacBook Pro M4 Max 36GB 16-inch	0	F32	—	MLX	No	-2775.1 GB	—	Estimated	$3,499	GLM-5 does not fit on MacBook Pro M4 Max 36GB 16-inch at the current practical quantization.
23	MacBook Pro M4 Max 48GB 16-inch	0	F32	—	MLX	No	-2763.1 GB	—	Estimated	$3,999	GLM-5 does not fit on MacBook Pro M4 Max 48GB 16-inch at the current practical quantization.
24	Mac Studio M3 Ultra 96GB	0	F32	—	MLX	No	-2715.1 GB	—	Estimated	$3,999	GLM-5 does not fit on Mac Studio M3 Ultra 96GB at the current practical quantization.
25	MacBook Pro M4 Max 64GB 16-inch	0	F32	—	MLX	No	-2747.1 GB	—	Estimated	$4,499	GLM-5 does not fit on MacBook Pro M4 Max 64GB 16-inch at the current practical quantization.
26	Mac Studio M4 Max 128GB	0	F32	—	MLX	No	-2683.1 GB	—	Estimated	$4,499	GLM-5 does not fit on Mac Studio M4 Max 128GB at the current practical quantization.
27	MacBook Pro M5 Max 128GB 16-inch	0	F32	—	MLX	No	-2683.1 GB	—	Estimated	$5,399	GLM-5 does not fit on MacBook Pro M5 Max 128GB 16-inch at the current practical quantization.
28	MacBook Pro M4 Max 128GB 16-inch	0	F32	—	MLX	No	-2683.1 GB	—	Estimated	$5,999	GLM-5 does not fit on MacBook Pro M4 Max 128GB 16-inch at the current practical quantization.
29	Mac Pro M2 Ultra 192GB	0	F32	—	MLX	No	-2619.1 GB	—	Estimated	$6,999	GLM-5 does not fit on Mac Pro M2 Ultra 192GB at the current practical quantization.

GLM-5 — ranking first, raw rows below

Start with the ranked Mac table above. Use the rest of this page to inspect raw Apple Silicon coverage and model metadata.

Quantizations observed: 4bit

5Benchmark rows

1Chip tiers covered

16.7Fastest avg tok/s (M3 Ultra (512 GB))

391.82 GBMinimum RAM observed

Quick take

Fastest published result is 16.7 tok/s on M3 Ultra (512 GB) at 4bit. Smallest published fit is 391.8 GB on M3 Ultra (512 GB). Longest published context on this page is 33k. Published runtimes include MLX. Start with Rankings for the decision, then use the raw rows below to audit the evidence.

Based on 5 external benchmarks; no lab runs yet.

Published runtimes: MLX.

Need the best Mac for this model? Use Buy Need a setup-first answer? Use Run Checking whether it fits? Use Fit Browse Macs by exact hardware Need the full audit trail? Use Bench Comparing against rented GPUs? Use AI Datacenter Index

Catalog record

744BTotal params

40BActive params

202,752Context window

2026-02-11Release date

This is a reference-only model record. It remains useful for historical benchmarks, migration checks, and audit context, but it is excluded from current frontier packs.

What this model is, and what Apple Silicon users are actually seeing

Official model cards tell you what the model is for and which software stacks it targets. Field reality below shows how much Apple Silicon evidence we have so far.

Official brief

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI).

Official source · Raw model card

agentscodingreasoning

Runtime support mentioned

vLLMSGLangTransformersKTransformersOpenHandsClaude CodexLLM

Official specs

Total parameters: 744B.
Active parameters: 40B.
Attention: DeepSeek Sparse Attention.

Official takeaways

Humanity’s Last Exam (HLE) & other reasoning tasks: We evaluate with a maximum generation length of 131,072 tokens (temperature=1.0, top_p=0.95, max_new_tokens=131072).
BrowserComp: Without context management, we retain details from the most recent 5 turns. With context management, we use the same discard-all strategy as DeepSeek-v3.2 and Kimi K2.5.
Terminal-Bench 2.0 (Terminus 2): We evaluate with the Terminus framework using timeout=2h, temperature=0.7, top_p=1.0, max_new_tokens=8192, with a 128K context window. Resource limits are capped at 16 CPUs and 32 GB RAM.
Terminal-Bench 2.0 (Claude Code): We evaluate in Claude Code 2.1.14 (think mode, default effort) with temperature=1.0, top_p=0.95, max_new_tokens=65536.

Official model cards describe intent, capabilities, and supported stacks. They do not prove Apple Silicon speed by themselves.

Field reality on Apple Silicon

GLM-5: 6 Apple Silicon field reports; best reported generation ~20 tok/s; best reported prompt processing ~187 tok/s; reported RAM use ~391.82-415.41GB; seen on M3 ULTRA, Mac Studio M3 ULTRA 512GB; via oMLX.

5Benchmark rows

6Field reports

4Practitioner signals

Sparse BenchmarksEvidence status

What practitioners keep saying

Use the official launch post to ground GLM-5 currentness and local-serving planning.
Do not treat the launch post as Apple Silicon performance evidence without hardware, runtime build, quantization, context, and measured throughput.
The oMLX single-request table reports GLM-5-4bit on an M3 Ultra 512GB Mac at pp1024/tg128 with 187.0 tok/s prompt processing, 16.7 tok/s generation, 5.477s TTFT, 13.156s end-to-end time, and peak memory at 391.82GB.

Apple Silicon field sources

r/LocalLLaMA
2026-02-24 · Mac Studio M3 Ultra 512GB · oMLX
An accessible M3 Ultra 512GB oMLX report shows GLM-5 running on Apple Silicon with slow single-request latency but materially better throughput under continuous batching and persistent KV cache.
r/LocalLLaMA
2026-02-16 · M3 Ultra
A LocalLLaMA operator reports Unsloth GLM-5 low-bit quants running on M3 Ultra, including Q2 around 20 tok/s, while the source leaves key setup details unspecified.
r/LocalLLaMA
Mac Studio M3 Ultra 512GB
GLM-5 is no longer theoretical on Apple Silicon; operators are already running it on M3 Ultra-class desktops and comparing the experience against frontier hosted models.

Runtime/source notes to verify

Z.ai
2026-04-30 · vLLM / SGLang
The official GLM-5 launch post says the model weights are available for local deployment and names vLLM and SGLang as supported serving frameworks, but it does not establish Mac throughput or fit.

Runtime mentions in the field

oMLX

Hardware mentioned in reports

M3 UltraMacMac Studio

What would improve confidence

Reproduce Field Performance Signal
Resolve Blocked Source Capture

Current published coverage

Published chip coverage includes M3 Ultra (512 GB). Fastest published row is 16.7 tok/s on M3 Ultra (512 GB) at 4bit. Lowest published RAM requirement is 391.8 GB on M3 Ultra (512 GB). Catalog context window is 33k.

M3 Ultra (512 GB)

Raw benchmark rows for GLM-5

Rows stay below the ranking because this page is answer-first. Use them to inspect exact chips, quantizations, runtimes, and sources.

Chip	Quant	RAM req.	Context	Avg tok/s	Prompt tok/s	Runtime	Source
M3 Ultra (512 GB)	4bit	391.8 GB	1k	16.7 tok/s	187.0 tok/s	MLX	ref
M3 Ultra (512 GB)	4bit	394.1 GB	4k	13.7 tok/s	180.1 tok/s	MLX	ref
M3 Ultra (512 GB)	4bit	396.7 GB	8k	13.2 tok/s	154.1 tok/s	MLX	ref
M3 Ultra (512 GB)	4bit	402.7 GB	16k	12.0 tok/s	117.4 tok/s	MLX	ref
M3 Ultra (512 GB)	4bit	415.4 GB	33k	10.7 tok/s	77.7 tok/s	MLX	ref

Best Macs for GLM-5

Ordered by fastest published tok/s on the chip family in each Mac. Click through for the full machine page.

Mac Studio M3 Ultra 96GB — 16.7 tok/s Mac Studio M3 Ultra 256GB — 16.7 tok/s

Chips with published results for GLM-5

M3 Ultra (512 GB)

Data

benchmarks.json — full dataset · models.json — model summaries · benchmarks.csv — CSV export

See all models →