this post was submitted on 25 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic...) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?

you are viewing a single comment's thread
view the rest of the comments
[โ€“] Cradawx@alien.top 1 points 9 months ago (1 children)

No, several sources include Microsoft have said GPT 3.5 Turbo is 20B. GPT 3 was 175B, and GPT 3.5 Turbo was about 10x cheaper on the API than GPT 3 when it came out so it makes sense.

[โ€“] FullOf_Bad_Ideas@alien.top 1 points 9 months ago

Yeah if that's the case, I can see gpt-4 requiring about 220-250B of loaded parameters to do token decoding