this post was submitted on 13 Nov 2023
1 points (100.0% liked)
LocalLLaMA
1 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 10 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
It actually does exist.
It's called Petals.
I believe it was made to run Bloom 176B.
Why does no one use it?
It’s terribly inefficient in many ways. Data centers with best GPUs are the most efficient hardware and energy wise. They are often built in places with access to cheap/green energy and subsidies. Also for research/development cash is cheap, so there’s little incentive to play with some decentralized stuff which adds a level of technical abstraction + needing a community. Opportunity cost wayyy outweighs running this in a data center for the vast majority of use cases.
Distributed inference IS indeed slower BUT its definitely not too slow for production use. I've used it and it's still faster than GPT4 with the proper cluster.