llama-3-2-1b-instruct — ranking first, raw rows below

Start with the ranked Mac table above. Use the rest of this page to inspect raw Apple Silicon coverage and model metadata.

Quantizations observed: Q4_K - Medium

63Benchmark rows

63Chip tiers covered

229.0Fastest avg tok/s (M5 Max (32-core GPU, 36 GB))

—Minimum RAM observed

Quick take

Fastest published result is 229.0 tok/s on M5 Max (32-core GPU, 36 GB) at Q4_K - Medium. Published runtimes include llamafile. Start with Rankings for the decision, then use the raw rows below to audit the evidence.

Based on 63 external benchmarks; no lab runs yet.

Published runtimes: llamafile.

Need the best Mac for this model? Use Buy Need a setup-first answer? Use Run Checking whether it fits? Use Fit Browse Macs by exact hardware Need the full audit trail? Use Bench Comparing against rented GPUs? Use AI Datacenter Index

Current published coverage

Published chip coverage includes M5 Max (32-core GPU, 36 GB), M4 Max (40-core GPU, 128 GB), M4 Max (40-core GPU, 64 GB), M4 Max (40-core GPU, 48 GB), M3 Ultra (80-core GPU, 512 GB) plus 58 more chip tiers. Fastest published row is 229.0 tok/s on M5 Max (32-core GPU, 36 GB) at Q4_K - Medium.

Raw benchmark rows for llama-3-2-1b-instruct

Rows stay below the ranking because this page is answer-first. Use them to inspect exact chips, quantizations, runtimes, and sources.

Chip	Quant	RAM req.	Context	Avg tok/s	Prompt tok/s	Runtime	Source
M5 Max (32-core GPU, 36 GB)	Q4_K - Medium	—	—	229.0 tok/s	3620.7 tok/s	llamafile	ref
M4 Max (40-core GPU, 128 GB)	Q4_K - Medium	—	—	182.6 tok/s	3833.8 tok/s	llamafile	ref
M4 Max (40-core GPU, 64 GB)	Q4_K - Medium	—	—	180.2 tok/s	3857.5 tok/s	llamafile	ref
M4 Max (40-core GPU, 48 GB)	Q4_K - Medium	—	—	179.0 tok/s	3750.6 tok/s	llamafile	ref
M3 Ultra (80-core GPU, 512 GB)	Q4_K - Medium	—	—	178.8 tok/s	5601.0 tok/s	llamafile	ref
M3 Ultra (80-core GPU, 256 GB)	Q4_K - Medium	—	—	177.9 tok/s	4999.5 tok/s	llamafile	ref
M2 Ultra (60-core GPU, 128 GB)	Q4_K - Medium	—	—	176.4 tok/s	3295.5 tok/s	llamafile	ref
M2 Ultra (60-core GPU, 64 GB)	Q4_K - Medium	—	—	174.1 tok/s	3290.2 tok/s	llamafile	ref
M2 Ultra (60-core GPU, 192 GB)	Q4_K - Medium	—	—	169.8 tok/s	3272.2 tok/s	llamafile	ref
M4 Max (32-core GPU, 36 GB)	Q4_K - Medium	—	—	166.5 tok/s	3268.9 tok/s	llamafile	ref
M4 Max (GPU count not published, 128 GB)	Q4_K - Medium	—	—	156.3 tok/s	679.7 tok/s	llamafile	ref
M2 Max (38-core GPU, 32 GB)	Q4_K - Medium	—	—	153.0 tok/s	2551.4 tok/s	llamafile	ref
M1 Ultra (64-core GPU, 128 GB)	Q4_K - Medium	—	—	151.1 tok/s	2977.1 tok/s	llamafile	ref
M3 Max (40-core GPU, 48 GB)	Q4_K - Medium	—	—	149.0 tok/s	3399.1 tok/s	llamafile	ref
M3 Max (40-core GPU, 128 GB)	Q4_K - Medium	—	—	146.3 tok/s	3291.9 tok/s	llamafile	ref
M1 Ultra (48-core GPU, 128 GB)	Q4_K - Medium	—	—	138.0 tok/s	2582.4 tok/s	llamafile	ref
M3 Max (30-core GPU, 36 GB)	Q4_K - Medium	—	—	133.0 tok/s	2553.3 tok/s	llamafile	ref
M3 Max (30-core GPU, 96 GB)	Q4_K - Medium	—	—	132.9 tok/s	2520.0 tok/s	llamafile	ref
M2 Max (30-core GPU, 32 GB)	Q4_K - Medium	—	—	127.6 tok/s	2031.7 tok/s	llamafile	ref
M1 Max (32-core GPU, 32 GB)	Q4_K - Medium	—	—	125.8 tok/s	2025.2 tok/s	llamafile	ref
M1 Max (32-core GPU, 64 GB)	Q4_K - Medium	—	—	120.7 tok/s	1972.1 tok/s	llamafile	ref
M2 Ultra (GPU count not published, 128 GB)	Q4_K - Medium	—	—	120.4 tok/s	659.8 tok/s	llamafile	ref
M4 Pro (20-core GPU, 24 GB)	Q4_K - Medium	—	—	119.2 tok/s	2128.8 tok/s	llamafile	ref
M4 Pro (20-core GPU, 48 GB)	Q4_K - Medium	—	—	118.9 tok/s	2134.1 tok/s	llamafile	ref
M4 Pro (20-core GPU, 64 GB)	Q4_K - Medium	—	—	118.6 tok/s	2145.1 tok/s	llamafile	ref
M4 Pro (16-core GPU, 64 GB)	Q4_K - Medium	—	—	111.9 tok/s	1858.9 tok/s	llamafile	ref
M4 Pro (16-core GPU, 48 GB)	Q4_K - Medium	—	—	111.0 tok/s	1754.6 tok/s	llamafile	ref
M4 Pro (16-core GPU, 24 GB)	Q4_K - Medium	—	—	110.9 tok/s	1823.8 tok/s	llamafile	ref
M3 Max (40-core GPU, 64 GB)	Q4_K - Medium	—	—	107.0 tok/s	2521.6 tok/s	llamafile	ref
M1 Max (24-core GPU, 64 GB)	Q4_K - Medium	—	—	105.7 tok/s	1669.5 tok/s	llamafile	ref
M2 Pro (19-core GPU, 32 GB)	Q4_K - Medium	—	—	100.3 tok/s	1487.6 tok/s	llamafile	ref
M2 Pro (19-core GPU, 16 GB)	Q4_K - Medium	—	—	99.5 tok/s	1457.9 tok/s	llamafile	ref
M5 (10-core GPU, 32 GB)	Q4_K - Medium	—	—	98.4 tok/s	1271.5 tok/s	llamafile	ref
M5 (10-core GPU, 16 GB)	Q4_K - Medium	—	—	98.1 tok/s	1244.9 tok/s	llamafile	ref
M1 Max (24-core GPU, 32 GB)	Q4_K - Medium	—	—	93.9 tok/s	1582.0 tok/s	llamafile	ref
M2 Pro (16-core GPU, 32 GB)	Q4_K - Medium	—	—	91.5 tok/s	1281.4 tok/s	llamafile	ref
M2 Pro (16-core GPU, 16 GB)	Q4_K - Medium	—	—	91.1 tok/s	1328.2 tok/s	llamafile	ref
M3 Pro (18-core GPU, 36 GB)	Q4_K - Medium	—	—	89.8 tok/s	1586.2 tok/s	llamafile	ref
M3 Pro (14-core GPU, 36 GB)	Q4_K - Medium	—	—	88.2 tok/s	1327.9 tok/s	llamafile	ref
M3 Pro (14-core GPU, 18 GB)	Q4_K - Medium	—	—	88.1 tok/s	1344.0 tok/s	llamafile	ref
M3 Pro (18-core GPU, 18 GB)	Q4_K - Medium	—	—	85.6 tok/s	1573.7 tok/s	llamafile	ref
M1 Pro (16-core GPU, 16 GB)	Q4_K - Medium	—	—	78.2 tok/s	1158.1 tok/s	llamafile	ref
M1 Pro (16-core GPU, 32 GB)	Q4_K - Medium	—	—	77.2 tok/s	1166.0 tok/s	llamafile	ref
M4 (10-core GPU, 16 GB)	Q4_K - Medium	—	—	76.2 tok/s	1091.1 tok/s	llamafile	ref
M4 (10-core GPU, 32 GB)	Q4_K - Medium	—	—	75.6 tok/s	1069.9 tok/s	llamafile	ref
M4 (10-core GPU, 24 GB)	Q4_K - Medium	—	—	75.4 tok/s	1036.0 tok/s	llamafile	ref
M1 Pro (14-core GPU, 16 GB)	Q4_K - Medium	—	—	71.8 tok/s	1040.1 tok/s	llamafile	ref
M1 Pro (14-core GPU, 32 GB)	Q4_K - Medium	—	—	71.0 tok/s	1063.8 tok/s	llamafile	ref
M4 (GPU count not published, 16 GB)	Q4_K - Medium	—	—	68.0 tok/s	239.0 tok/s	llamafile	ref
M3 (10-core GPU, 16 GB)	Q4_K - Medium	—	—	67.2 tok/s	931.5 tok/s	llamafile	ref
M4 (8-core GPU, 16 GB)	Q4_K - Medium	—	—	65.9 tok/s	897.0 tok/s	llamafile	ref
M3 (10-core GPU, 24 GB)	Q4_K - Medium	—	—	64.7 tok/s	916.5 tok/s	llamafile	ref
M3 (GPU count not published, 16 GB)	Q4_K - Medium	—	—	61.6 tok/s	229.7 tok/s	llamafile	ref
M1 Ultra (GPU count not published, 128 GB)	Q4_K - Medium	—	—	57.1 tok/s	283.8 tok/s	llamafile	ref
M2 (10-core GPU, 8 GB)	Q4_K - Medium	—	—	56.5 tok/s	804.8 tok/s	llamafile	ref
M2 (10-core GPU, 16 GB)	Q4_K - Medium	—	—	55.6 tok/s	801.8 tok/s	llamafile	ref
M2 (10-core GPU, 24 GB)	Q4_K - Medium	—	—	54.8 tok/s	813.7 tok/s	llamafile	ref
M1 (8-core GPU, 8 GB)	Q4_K - Medium	—	—	40.4 tok/s	607.4 tok/s	llamafile	ref
M1 (8-core GPU, 16 GB)	Q4_K - Medium	—	—	40.2 tok/s	585.7 tok/s	llamafile	ref
M1 (7-core GPU, 8 GB)	Q4_K - Medium	—	—	38.5 tok/s	533.0 tok/s	llamafile	ref
M1 (7-core GPU, 16 GB)	Q4_K - Medium	—	—	37.9 tok/s	530.9 tok/s	llamafile	ref
M2 (8-core GPU, 16 GB)	Q4_K - Medium	—	—	35.3 tok/s	523.2 tok/s	llamafile	ref
M2 (8-core GPU, 8 GB)	Q4_K - Medium	—	—	34.5 tok/s	523.2 tok/s	llamafile	ref

Chips with published results for llama-3-2-1b-instruct

M5 Max (32-core GPU, 36 GB)M4 Max (40-core GPU, 128 GB)M4 Max (40-core GPU, 64 GB)M4 Max (40-core GPU, 48 GB)M3 Ultra (80-core GPU, 512 GB)M3 Ultra (80-core GPU, 256 GB)M2 Ultra (60-core GPU, 128 GB)M2 Ultra (60-core GPU, 64 GB)M2 Ultra (60-core GPU, 192 GB)M4 Max (32-core GPU, 36 GB)M4 Max (GPU count not published, 128 GB)M2 Max (38-core GPU, 32 GB)M1 Ultra (64-core GPU, 128 GB)M3 Max (40-core GPU, 48 GB)M3 Max (40-core GPU, 128 GB)M1 Ultra (48-core GPU, 128 GB)M3 Max (30-core GPU, 36 GB)M3 Max (30-core GPU, 96 GB)M2 Max (30-core GPU, 32 GB)M1 Max (32-core GPU, 32 GB)M1 Max (32-core GPU, 64 GB)M2 Ultra (GPU count not published, 128 GB)M4 Pro (20-core GPU, 24 GB)M4 Pro (20-core GPU, 48 GB)M4 Pro (20-core GPU, 64 GB)M4 Pro (16-core GPU, 64 GB)M4 Pro (16-core GPU, 48 GB)M4 Pro (16-core GPU, 24 GB)M3 Max (40-core GPU, 64 GB)M1 Max (24-core GPU, 64 GB)M2 Pro (19-core GPU, 32 GB)M2 Pro (19-core GPU, 16 GB)M5 (10-core GPU, 32 GB)M5 (10-core GPU, 16 GB)M1 Max (24-core GPU, 32 GB)M2 Pro (16-core GPU, 32 GB)M2 Pro (16-core GPU, 16 GB)M3 Pro (18-core GPU, 36 GB)M3 Pro (14-core GPU, 36 GB)M3 Pro (14-core GPU, 18 GB)M3 Pro (18-core GPU, 18 GB)M1 Pro (16-core GPU, 16 GB)M1 Pro (16-core GPU, 32 GB)M4 (10-core GPU, 16 GB)M4 (10-core GPU, 32 GB)M4 (10-core GPU, 24 GB)M1 Pro (14-core GPU, 16 GB)M1 Pro (14-core GPU, 32 GB)M4 (GPU count not published, 16 GB)M3 (10-core GPU, 16 GB)M4 (8-core GPU, 16 GB)M3 (10-core GPU, 24 GB)M3 (GPU count not published, 16 GB)M1 Ultra (GPU count not published, 128 GB)M2 (10-core GPU, 8 GB)M2 (10-core GPU, 16 GB)M2 (10-core GPU, 24 GB)M1 (8-core GPU, 8 GB)M1 (8-core GPU, 16 GB)M1 (7-core GPU, 8 GB)M1 (7-core GPU, 16 GB)M2 (8-core GPU, 16 GB)M2 (8-core GPU, 8 GB)

Data

benchmarks.json — full dataset · models.json — model summaries · benchmarks.csv — CSV export

See all models →