LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Clearing up confusion: GPT 3.5-Turbo may not be 20b after all (alien.top)

submitted 2 years ago by SomeOddCodeGuy@alien.top to c/localllama@poweruser.forum

19 comments fedilink hide all child comments

So one thing that had really bothered me was that recent Arxiv paper claiming that despite GPT 3 being 175B, and GPT 4 being around 1.7T, somehow 3.5 Turbo was 20b.

This had been on my mind for the past couple of days because it just made no sense to me, so this evening I went to go check out the paper again, and noticed that I could not download the PDF or postscript. Then I saw this update comment on the Arxiv page, added yesterday:

Contains inappropriately sourced conjecture of OpenAI's ChatGPT parameter count from this http URL, a citation which was omitted. The authors do not have direct knowledge or verification of this information, and relied solely on this article, which may lead to public confusion

That link leads to a Forbes article, from before GPT 4 even released, that claims that ChatGPT in general is 20b parameters.

It seems like the chatbot application was one of the most popular ones, so ChatGPT came out first. ChatGPT is not just smaller (20 billion vs. 175 billion parameters) and therefore faster than GPT-3, but it is also more accurate than GPT-3 when solving conversational tasks—a perfect business case for a lower cost/better quality AI product.

So it would appear that they sourced that knowledge from Forbes, and after everyone got really confused they realized that it might not actually be correct, and the paper got modified.

So, before some wild urban legend forms that GPT 3.5 is 20b, just thought I'd mention that lol.

you are viewing a single comment's thread
view the rest of the comments

[–] Monkey_1505@alien.top 1 points 2 years ago (1 children)

I tend to disagree that it's less optimized. Generally more data, and more compute reduces the need for heavy data refinement, whereas smaller models with less available compute benefit more.

[–] Auto_Luke@alien.top 1 points 2 years ago (1 children)

It's very true that a small amount of high-quality data is better than a lot of garbage, but even better would be a large amount of high-quality data optimized in a way that we haven’t figured out yet. However, openai could be even one year ahead. Unfortunately, it is closedai now.

[–] Monkey_1505@alien.top 1 points 2 years ago

That's true, but they still have less impetus to do that. They are being fairly heavily subsidized by microsoft so running costs and compute isn't much of a concern. It's only really at the point where more data, and more compute hits a wall, where they have to worry too much about data refinement.