LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Could multiple 7b models outperform 70b models? (alien.top)

submitted 2 years ago by freehuntx@alien.top to c/localllama@poweruser.forum

22 comments fedilink hide all child comments

If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic...) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?

you are viewing a single comment's thread
view the rest of the comments

[–] wishtrepreneur@alien.top 1 points 2 years ago (1 children)

You can use a general LLM to “classify” a prompt and then route the entire prompt to a downstream LLM.

why can't you just train the "router" LLM on which downstream LLM to use and pass the activations to the downstream LLMs? Can't you have "headless" (without encoding layer) downstream LLMs? So inference could use a (6.5B+6.5B) params model with the generalizability of a 70B model.

[–] feynmanatom@alien.top 1 points 2 years ago

Hmm, not sure if I track what an encoding layer is? The encoding phase involves filling the KV cache across the depth of the model. I don’t think there’s an activation you could just pass across without model surgery + additional fine tuning.