this post was submitted on 23 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
You don't say what quant you are using, if any. But on Q4K_M I get this on my M1 Max using pure llama.cpp.
Your M3 has lower memory bandwidth than my M1. It's the 300GB/s version versus 400GB/s.