this post was submitted on 23 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I'll be interested to see what responses you get, but I'm gonna come out and say that the Mac's power is NOT its speed. Pound for pound, a CUDA video card is going to absolutely leave our machines in the dust.
So, with that said- I actually think your 20 tokens a second is kind of great. I mean- my M2 Ultra is two M2 Max processors stacked on top of each other, and I get the following for Mythomax-l2-13b:
So you're actually doing better than I'd expect an M2 Max to do.