this post was submitted on 30 Oct 2023
1 points (100.0% liked)
LocalLLaMA
1 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 10 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
This isn't an original concept but the data transfer rates make distributed training like this pretty impractical. Unless you're doing some galaxy brained techniques you will need to distribute like 120gb every step (for a 70B model) and training on an old step is a waste of time. So parallelizing it on internet connected machines is a horrible option.
You don't really need this setup anyways as you can train a 4bit 13B Lora on a gaming pc. This would only be useful for big foundation models, maybe, in which case you're going to get way faster results by just renting some cloud gpus