← All benchmarks

Llama 2 7B — Apple Silicon Benchmarks

Measured inference speed for Llama 2 7B across 5 Apple Silicon chips. Tokens per second at multiple quantization levels. Real runs, not estimates.

Quantizations measured: Q4_0

5Benchmark rows
5Chip tiers covered
94.3Fastest avg tok/s (M2 Ultra (76-core GPU, 192 GB))
3.56 GBMinimum RAM observed

Benchmark results for Llama 2 7B

Rows sorted by avg tok/s descending. Click source badge to see original measurement page.

ChipQuantRAM req.ContextAvg tok/sPrompt tok/sRuntimeSource
M2 Ultra (76-core GPU, 192 GB)Q4_03.6 GB51294.3 tok/s1238.5 tok/sllama.cppref
M3 Max (40-core GPU, 48 GB)Q4_03.6 GB51265.8 tok/s691.0 tok/sllama.cppref
M1 Pro (16-core GPU)Q4_03.6 GB51236.4 tok/s266.3 tok/sllama.cppref
M3 Pro (18-core GPU)Q4_03.6 GB51230.7 tok/s341.7 tok/sllama.cppref
M4 (10-core GPU, 16 GB)Q4_03.6 GB51224.1 tok/s221.3 tok/sllama.cppref

benchmarks.json — full dataset  ·  models.json — model summaries  ·  benchmarks.csv — CSV export

See all models →