Evidence pressure
Coverage is still the main weakness.
Metadata debt is now small. The bigger issue is still how much of the catalog depends on community rows and thin runtime coverage.
Macs
28
Models
33
Source captures
89
Bench is where Silicon Score stays honest. It shows what is measured, what is still modeled, how runtimes are classified, which methodologies are comparable, and what the next research pass should attack.
Catalog state
Benchmark rows
250
Open issues
2
Research queue
8
Workflow comparisons
4
Model scorecards
10
Audit now
This board merges unresolved quality issues, frontier heat without enough evidence, and the active operator queue. It is the shortest path from “what is weak?” to “what should we do next?”
Qwen3.5-27B
Qwen3.5-27B. It appears across 4 lenses and 5 budget slices. has 11 Apple Silicon benchmark rows. 1 official model brief captured. 4 fetched artifacts. 8 curated practitioner signals, 8 Apple Silicon-specific. Qwen3.5-27B: 8 Apple Silicon field reports; best reported generation ~31.6 tok/s; seen on MacBook Pro M5 MAX 128GB, M2 ULTRA 128GB, MacBook Pro M4 PRO; via MLX, llama.cpp.
Qwen3.5-122B-A10B
Qwen3.5-122B-A10B. It appears across 4 lenses and 3 budget slices. has 6 Apple Silicon benchmark rows. 1 official model brief captured. 5 fetched artifacts, 1 blocked or partial. 6 curated practitioner signals, 6 Apple Silicon-specific. Qwen3.5-122B-A10B: 4 Apple Silicon field reports; best reported generation ~65.9 tok/s; best reported prompt processing ~500 tok/s; seen on MacBook Pro M5 MAX 128GB, Mac Studio M3 ULTRA 256GB, MacBook Pro M5 PRO 64GB; via MLX, llama.cpp.
Devstral Small 2 24B
Devstral Small 2 24B. It appears across 3 lenses and 5 budget slices. has 6 Apple Silicon benchmark rows. 1 official model brief captured. 5 fetched artifacts. 4 curated practitioner signals, 5 Apple Silicon-specific. Devstral Small 2 24B: 5 Apple Silicon field reports; best reported generation ~47 tok/s; seen on Mac Studio M3 ULTRA 256GB, M1 ULTRA 64GB; via MLX.
Qwen3.5-397B-A17B
Qwen3.5-397B-A17B. It appears across 4 lenses and 1 budget slice. has 2 Apple Silicon benchmark rows. 1 official model brief captured. 3 fetched artifacts. 4 curated practitioner signals, 5 Apple Silicon-specific. Qwen3.5-397B-A17B: 5 Apple Silicon field reports; best reported generation ~40 tok/s; seen on Mac Studio M3 ULTRA 512GB, MacBook Pro M5 MAX 128GB; via MLX, flash-moe.
Qwen3.5-35B-A3B
Qwen3.5-35B-A3B. It appears across 3 lenses and 4 budget slices. has 7 Apple Silicon benchmark rows. 1 official model brief captured. 7 fetched artifacts. 8 curated practitioner signals, 11 Apple Silicon-specific. Qwen3.5-35B-A3B: 11 Apple Silicon field reports; best reported generation ~89.4 tok/s; seen on MacBook Pro M1 MAX 64GB, Mac Mini M4 PRO 64GB, Apple Silicon; via llama.cpp, MLX, Ollama.
Qwen 3 32B
Qwen 3 32B. It appears across 4 lenses and 5 budget slices. has 2 Apple Silicon benchmark rows. 1 official model brief captured. 3 fetched artifacts. 2 curated practitioner signals, 2 Apple Silicon-specific. Qwen 3 32B: 1 Apple Silicon field report; best reported generation ~20 tok/s; seen on MacBook Pro M4 MAX 128GB; via MLX.
GLM-4.5-Air
GLM-4.5-Air. It appears across 4 lenses and 2 budget slices. has 1 Apple Silicon benchmark row. 1 official model brief captured. 4 fetched artifacts. 3 curated practitioner signals, 2 Apple Silicon-specific. GLM-4.5-Air: 1 Apple Silicon field report; best reported generation ~54 tok/s; seen on Mac Studio M3 ULTRA 256GB; via MLX.
Llama 3.3 70B
Llama 3.3 70B. It appears across 4 lenses and 4 budget slices. has 2 Apple Silicon benchmark rows. 1 official locator ready for capture. 2 fetched artifacts. 1 curated practitioner signal, 1 Apple Silicon-specific. Llama 3.3 70B: 1 Apple Silicon field report; best reported generation ~11.8 tok/s; seen on MacBook Pro M4 MAX 128GB; via MLX.
Evidence pressure
Metadata debt is now small. The bigger issue is still how much of the catalog depends on community rows and thin runtime coverage.
Macs
28
Models
33
Source captures
89
Unresolved issues
Repair m1-pro-16-core-gpu--llama-2-7b--q4-0--llamacpp
Machine RAM is still unknown because the linked source identifies the chip variant but not the exact memory tier.
recover machine ram tier
Repair m3-pro-18-core-gpu--llama-2-7b--q4-0--llamacpp
Machine RAM is still unknown because the linked source identifies the chip variant but not the exact memory tier.
recover machine ram tier
Runtime taxonomy
Public runtime locks should stay simple. Backends and wrappers still matter, but they belong here as audit semantics rather than top-level navigation.
Llamafile
Llamafile wrapper on llama.cpp
llama.cpp stack
Public filter: llama.cpp stack.
MLX
MLX backend
MLX
Public filter: MLX.
llama.cpp
llama.cpp backend
llama.cpp stack
Public filter: llama.cpp stack.
LM Studio
LM Studio wrapper on mixed
Audit only
Do not expose as a canonical runtime lock.
Ollama
Ollama wrapper on llama.cpp
Ollama · llama.cpp stack
Public filter: Ollama · llama.cpp stack.
Methodology comparability
Frontier hotspots
Qwen3.5-27B
Qwen3.5-27B. It appears across 4 lenses and 5 budget slices. has 11 Apple Silicon benchmark rows. 1 official model brief captured. 4 fetched artifacts. 8 curated practitioner signals, 8 Apple Silicon-specific. Qwen3.5-27B: 8 Apple Silicon field reports; best reported generation ~31.6 tok/s; seen on MacBook Pro M5 MAX 128GB, M2 ULTRA 128GB, MacBook Pro M4 PRO; via MLX, llama.cpp.
capture practitioner runtime notes
Qwen3.5-122B-A10B
Qwen3.5-122B-A10B. It appears across 4 lenses and 3 budget slices. has 6 Apple Silicon benchmark rows. 1 official model brief captured. 5 fetched artifacts, 1 blocked or partial. 6 curated practitioner signals, 6 Apple Silicon-specific. Qwen3.5-122B-A10B: 4 Apple Silicon field reports; best reported generation ~65.9 tok/s; best reported prompt processing ~500 tok/s; seen on MacBook Pro M5 MAX 128GB, Mac Studio M3 ULTRA 256GB, MacBook Pro M5 PRO 64GB; via MLX, llama.cpp.
capture practitioner runtime notes
Devstral Small 2 24B
Devstral Small 2 24B. It appears across 3 lenses and 5 budget slices. has 6 Apple Silicon benchmark rows. 1 official model brief captured. 5 fetched artifacts. 4 curated practitioner signals, 5 Apple Silicon-specific. Devstral Small 2 24B: 5 Apple Silicon field reports; best reported generation ~47 tok/s; seen on Mac Studio M3 ULTRA 256GB, M1 ULTRA 64GB; via MLX.
capture practitioner runtime notes
Qwen3.5-397B-A17B
Qwen3.5-397B-A17B. It appears across 4 lenses and 1 budget slice. has 2 Apple Silicon benchmark rows. 1 official model brief captured. 3 fetched artifacts. 4 curated practitioner signals, 5 Apple Silicon-specific. Qwen3.5-397B-A17B: 5 Apple Silicon field reports; best reported generation ~40 tok/s; seen on Mac Studio M3 ULTRA 512GB, MacBook Pro M5 MAX 128GB; via MLX, flash-moe.
capture practitioner runtime notes
Qwen3.5-35B-A3B
Qwen3.5-35B-A3B. It appears across 3 lenses and 4 budget slices. has 7 Apple Silicon benchmark rows. 1 official model brief captured. 7 fetched artifacts. 8 curated practitioner signals, 11 Apple Silicon-specific. Qwen3.5-35B-A3B: 11 Apple Silicon field reports; best reported generation ~89.4 tok/s; seen on MacBook Pro M1 MAX 64GB, Mac Mini M4 PRO 64GB, Apple Silicon; via llama.cpp, MLX, Ollama.
capture practitioner runtime notes
Qwen 3 32B
Qwen 3 32B. It appears across 4 lenses and 5 budget slices. has 2 Apple Silicon benchmark rows. 1 official model brief captured. 3 fetched artifacts. 2 curated practitioner signals, 2 Apple Silicon-specific. Qwen 3 32B: 1 Apple Silicon field report; best reported generation ~20 tok/s; seen on MacBook Pro M4 MAX 128GB; via MLX.
capture practitioner runtime notes
Operator queue
Verify Qwen 3 30B A3B on the owned M4 Max 64 GB
4 reference row(s) already exist on this exact chip across 4 quantization(s), with 52.58-92.09 tok/s signal.
lab verification
Expand Llama 3.3 70B beyond 1 Apple Silicon tier
Llama 3.3 70B is a high-value purchase target but currently has published rows on only 1 chip tier.
coverage expansion
Expand Qwen 3 235B A22B beyond 1 Apple Silicon tier
Qwen 3 235B A22B is a high-value purchase target but currently has published rows on only 1 chip tier.
coverage expansion
Verify Qwen 3 4B on the owned M4 Max 64 GB
6 reference row(s) already exist on this exact chip across 6 quantization(s), with 111.55-149.07 tok/s signal.
lab verification
Verify Qwen 2.5 14B Instruct on the owned M4 Max 64 GB
1 reference row(s) already exist on this exact chip across 1 quantization(s), with 25.87-25.87 tok/s signal.
lab verification
Verify Llama 3.1 8B Instruct on the owned M4 Max 64 GB
1 reference row(s) already exist on this exact chip across 1 quantization(s), with 47.1-47.1 tok/s signal.
lab verification
Model scorecards
These standardized Apple Silicon evals add task-shape truth to the catalog. They come from one fixed high-end Mac and runtime path, so they should not override the main rankings, but they are excellent for understanding which models are fast, balanced, coding-heavy, or tool-soft.
Qwen3.5-122B-A10B
vLLM-MLX
Highest overall quality in this standardized set, but it demands real memory.
vLLM-MLX SCORECARD.md · discussion · 2026-03-04
Qwen3.5-122B-A10B
vLLM-MLX
The best value version in this scorecard: near-frontier quality at roughly half the RAM.
vLLM-MLX SCORECARD.md · discussion · 2026-03-04
Qwen3.5-35B-A3B
vLLM-MLX
The stronger version of the 35B MoE story: fast and much more balanced.
vLLM-MLX SCORECARD.md · discussion · 2026-03-04
Qwen3-Coder-Next
vLLM-MLX
Slightly slower than 4-bit, but reasoning is stronger and coding stays high.
vLLM-MLX SCORECARD.md · discussion · 2026-03-04
Qwen3-Coder-Next
vLLM-MLX
The fast coding-first option in this scorecard, with strong tool behavior.
vLLM-MLX SCORECARD.md · discussion · 2026-03-04
GLM-4.5-Air
vLLM-MLX
More balanced than the flash variant, but materially heavier.
vLLM-MLX SCORECARD.md · discussion · 2026-03-04
Qwen3.5-27B
vLLM-MLX
A strong fits-anywhere coding and tool-use compromise.
vLLM-MLX SCORECARD.md · discussion · 2026-03-04
Qwen3.5-35B-A3B
vLLM-MLX
Very fast for its size, but reasoning softness is visible in the standardized tasks.
vLLM-MLX SCORECARD.md · discussion · 2026-03-04
Qwen3.5-9B
vLLM-MLX
The smallest model in this set that still looks broadly useful for agent-style work.
vLLM-MLX SCORECARD.md · discussion · 2026-03-04
Devstral Small 2 24B
vLLM-MLX
Strong coding score, but tool calling is poor in this standardized setup.
vLLM-MLX SCORECARD.md · discussion · 2026-03-04
Workflow metrics
These records capture effective throughput and prefill-heavy scenarios from the same Mac and model across different runtime paths. They are valuable for teaching runtime choice, but they should stay in the audit lane rather than flatten into headline tokens-per-second rows.
Qwen 3 30B-A3B
Effective tok/s · Interactive
2026-03-20
These are effective throughput figures on a multi-turn ops-agent scenario. They include prefill and wrapper behavior, so they should teach runtime choice, not replace decode-speed benchmark rows.
LM Studio 41.7 · llama.cpp 41.4 · oMLX 38.0 · Ollama 26.0
Qwen3.5-35B-A3B
Effective tok/s · Interactive
2026-03-20
These are effective throughput results on an ops-agent workflow. They are best used to compare runtime behavior and caching quality on Apple Silicon, not to replace canonical decode-speed rows.
oMLX 38.0 · Rapid-MLX 35.6 · mlx-openai-server 26.2 · LM Studio (GGUF) 17.6 · LM Studio (MLX) 17.0
Qwen3.5-35B-A3B
Effective tok/s · 8,000 ctx
2026-03-20
This is an 8K prefill-stress comparison. It is useful for understanding caching and long-context behavior, not for headline decode-speed ranking.
oMLX 16.4 · mlx-openai-server 8.7 · Rapid-MLX 8.5 · LM Studio (GGUF) 7.8 · LM Studio (MLX) 5.9
Qwen 3 30B-A3B
Effective tok/s · 8,000 ctx
2026-03-20
This compares wrappers and backends on an 8K prefill-stress scenario. It is useful for long-context teaching, but it is not a canonical decode-speed row.
MLX fp16 8.6 · GGUF 7.6 · MLX bf16 6.0
Research map
Not every source should influence the product equally. This map makes those roles explicit so rankings can say when an answer is measured, estimated, or still fit-first.
First-party
Where measured truth should be upgraded into canon.
Silicon Score Lab
first party lab
Benchmark reference
Where comparable benchmark anchors and runtime-change signals usually appear first.
Awni Hannun benchmark gists
maintainer benchmark gists
Hugging Face model hub
model registry
oMLX repository
runtime repo
vLLM-MLX repository
runtime repo
Practitioner
Where operators reveal workflow reality and caveats before the benchmark layer catches up.
llama.cpp GitHub discussions
maintainer discussion
Reddit /r/LocalLLaMA
operator forum
Discovery
Where release movement starts, but not where performance truth should harden.
LocalScore accelerator runs
community benchmark aggregator
mlx-lm pull requests
maintainer pull requests
SharpAI HomeSec-Bench
published benchmark page
X local-AI chatter
social discovery