this post was submitted on 25 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic...) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?

you are viewing a single comment's thread
view the rest of the comments
[โ€“] yahma@alien.top 1 points 1 year ago (1 children)

Yes. This is known as Mixture of Experts (MOE).

We already have several promising ways of doing this:

  1. QMoE: A Scalable Algorithm for Sub-1-Bit Compression of Trillion-Parameter Mixture-of-Experts Architectures. Paper - Github
  2. S-Lora: Serving thousands of concurrent adapters.
  3. Lorax: Serve hundreds of concurrent adapters.
  4. LMoE: Simple method of dynamically loading Loras

I can't believe I hadn't run into this. Would you indulge me on the implications for agentic systems like Autogen? I've been working on having experts cooperate that way rather than being combined into a single model.