this post was submitted on 13 Nov 2023
1 points (100.0% liked)
LocalLLaMA
1 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 10 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Transferring the state over the internet so the next card can take over is sloooow. You'd want cards that can take a lot of layers to minimize that.
In other words, you want few and big gpu's in the network, not a bunch of small ones.
Yes, for actually dividing models across machines, which was the original idea. I'd shifted to a different (and less technically interesting) question of sharing GPUs without dividing the model.
For dividing training, though, see this paper:
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient