Will it fit and run well?

Select a model and a Mac to see which quantizations fit, how much headroom you have, and how fast you can expect generation.

Select a model and a Mac above to see compatibility results.

How this works

Model weight size is estimated from parameter count and bits per weight. Total RAM includes model weights, KV cache at 8k context tokens (FP16, 2 bytes per K+V element per layer per head), and a 0.5GB overhead buffer. A model "fits" if total RAM is under 85% of the machine's unified memory, leaving room for macOS and other processes. Expected speed comes from benchmark data where available — measured values from our lab, trusted reference benchmarks, or community reports. Each is labeled by evidence class.

Context windows, speeds, and RAM usage vary with runtime (llama.cpp, MLX, LM Studio, etc.), prompt length, batch size, and system load. These estimates are a starting point, not a guarantee.