this post was submitted on 18 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Another consideration is that I was told by someone with multiple cards, that if you split your layers across multiple cards, they don't all process the layers simultaneously.
So, if you are on 3x cards, you don't get a parallel performance benefit of all cards working at the same time. It processes layers on card 1, then card 2, then card 3.
The slowest card will obviously have the worst speed. Not sure what this will do for your load times of a model or your electricity bill, as well as the fact you need a system big enough to fit them all in.