LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

New Microsoft codediffusion paper suggests GPT-3.5 Turbo is only 20B, good news for open source models? (alien.top)

submitted 1 year ago by obvithrowaway34434@alien.top to c/localllama@poweruser.forum

27 comments fedilink hide all child comments

Wondering what everyone thinks in case this is true. It seems they're already beating all open source models including Llama-2 70B. Is this all due to data quality? Will Mistral be able to beat it next year?

Edit: Link to the paper -> https://arxiv.org/abs/2310.17680

https://preview.redd.it/kdk6fwr7vbxb1.png?width=605&format=png&auto=webp&s=21ac9936581d1376815d53e07e5b0adb739c3b06

you are viewing a single comment's thread
view the rest of the comments

[–] xadiant@alien.top 1 points 1 year ago (5 children)

No fucking way. GPT-3 has 175B params. In no shape or form they could have discovered the "secret sauce" to make an ultra smart 20B model. TruthfulQA paper suggests that bigger models are more likely to score worse, and ChatGPT's TQA score is impressively bad. I think the papers responsible for impressive open-source models are max 12-20 months old. Turbo version is probably quantized, that's all.

[–] Combinatorilliance@alien.top 1 points 1 year ago

I think it's plausible. Gpt3.5 isn't ultra smart. It's very hood most of the time, but it has clear limitations.

Seeing what mistral achieved with 7b, I'm sure we can get something similar to gpt3.5 in 20b given state of the art training and data. I'm sure OpenAI is using some tricks as well that aren't released to the public.

load more comments (4 replies)