LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Could multiple 7b models outperform 70b models? (alien.top)

submitted 2 years ago by freehuntx@alien.top to c/localllama@poweruser.forum

22 comments fedilink hide all child comments

If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic...) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?

you are viewing a single comment's thread
view the rest of the comments

[–] FullOf_Bad_Ideas@alien.top 1 points 2 years ago (2 children)

Jondurbin made something like this with qlora.

The explanation that gpt-4 is MoE model doesn't make sense to me. Gpt4 api is 30x more expensive than gpt-3-5-turbo. Gpt-3-5 turbo is 175B parameters, right? So, if they had 8 220B experts, it wouldn't need to cost 30x more, it would be 20-50% more for API use. There was also some speculation that 3.5 turbo is 22B. In that case it also doesn't make sense to me that it would be 30x as expensive.

[–] AutomataManifold@alien.top 1 points 2 years ago

Just to note: don't read too much into OpenAI's prices. They're deliberately losing money as a market-capturing strategy, so it's not guaranteed that there's a linear relationship between what they charge for a given service and what their actual costs are.

[–] Cradawx@alien.top 1 points 2 years ago (1 children)

No, several sources include Microsoft have said GPT 3.5 Turbo is 20B. GPT 3 was 175B, and GPT 3.5 Turbo was about 10x cheaper on the API than GPT 3 when it came out so it makes sense.

[–] FullOf_Bad_Ideas@alien.top 1 points 2 years ago

Yeah if that's the case, I can see gpt-4 requiring about 220-250B of loaded parameters to do token decoding