LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Could multiple 7b models outperform 70b models? (alien.top)

submitted 2 years ago by freehuntx@alien.top to c/localllama@poweruser.forum

22 comments fedilink hide all child comments

If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic...) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?

you are viewing a single comment's thread
view the rest of the comments

[–] yahma@alien.top 1 points 2 years ago (1 children)

Yes. This is known as Mixture of Experts (MOE).

We already have several promising ways of doing this:

QMoE: A Scalable Algorithm for Sub-1-Bit Compression of Trillion-Parameter Mixture-of-Experts Architectures. Paper - Github
S-Lora: Serving thousands of concurrent adapters.
Lorax: Serve hundreds of concurrent adapters.
LMoE: Simple method of dynamically loading Loras

[–] sampdoria_supporter@alien.top 1 points 2 years ago

I can't believe I hadn't run into this. Would you indulge me on the implications for agentic systems like Autogen? I've been working on having experts cooperate that way rather than being combined into a single model.