this post was submitted on 25 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic...) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?

you are viewing a single comment's thread
view the rest of the comments
[–] vasileer@alien.top 1 points 11 months ago (4 children)

yes, this is done by Mixture of Experts (MoE)

and we already have this type of examples:

coding - deepseek-coder-7B is better at coding than many 70B models

answering from the context - llama2-7B is better than llama-2-13B at openbookqa test

https://preview.redd.it/1gexvwd83i2c1.png?width=1000&format=png&auto=webp&s=cda1ee16000c2e89410091c172bf4756bc8a427b

etc.

[–] jxjq@alien.top 1 points 11 months ago (3 children)

Does this use of mixture-of-experts mean that multiple 70b models would perform ?better than multiple 7b models

[–] vasileer@alien.top 1 points 11 months ago (2 children)

the question was if multiple small models can beat a single big model but also having the speed advantage, and answer is yes, and an example of that is MOE, which is a collection of small models all inside a single big model,

https://huggingface.co/google/switch-c-2048 is a such example

[–] jxjq@alien.top 1 points 11 months ago

Thank you for sharing, I understand now

[–] extopico@alien.top 1 points 11 months ago

big is an understatement. Please do correct me if I got it wildly wrong, but it appears to be a 3.6TB colossus.