this post was submitted on 25 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Does this use of mixture-of-experts mean that multiple 70b models would perform ?better than multiple 7b models
the question was if multiple small models can beat a single big model but also having the speed advantage, and answer is yes, and an example of that is MOE, which is a collection of small models all inside a single big model,
https://huggingface.co/google/switch-c-2048 is a such example
Thank you for sharing, I understand now
big is an understatement. Please do correct me if I got it wildly wrong, but it appears to be a 3.6TB colossus.