this post was submitted on 02 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

So one thing that had really bothered me was that recent Arxiv paper claiming that despite GPT 3 being 175B, and GPT 4 being around 1.7T, somehow 3.5 Turbo was 20b.

This had been on my mind for the past couple of days because it just made no sense to me, so this evening I went to go check out the paper again, and noticed that I could not download the PDF or postscript. Then I saw this update comment on the Arxiv page, added yesterday:

Contains inappropriately sourced conjecture of OpenAI's ChatGPT parameter count from this http URL, a citation which was omitted. The authors do not have direct knowledge or verification of this information, and relied solely on this article, which may lead to public confusion

That link leads to a Forbes article, from before GPT 4 even released, that claims that ChatGPT in general is 20b parameters.

It seems like the chatbot application was one of the most popular ones, so ChatGPT came out first. ChatGPT is not just smaller (20 billion vs. 175 billion parameters) and therefore faster than GPT-3, but it is also more accurate than GPT-3 when solving conversational tasks—a perfect business case for a lower cost/better quality AI product.

So it would appear that they sourced that knowledge from Forbes, and after everyone got really confused they realized that it might not actually be correct, and the paper got modified.

So, before some wild urban legend forms that GPT 3.5 is 20b, just thought I'd mention that lol.

you are viewing a single comment's thread
view the rest of the comments
[–] Ilforte@alien.top 1 points 1 year ago (1 children)

I think this is ass-covering. Microsoft Research don't know the scale of ChatGPT? What are the odds?

They have to deny the leak by providing a non-credible attribution instead of saying "lmao we just talked to OpenAI engineers over a dinner", sure. But this doesn't mean that they, or Forbes, or multiple people who tested Turbo speed, compared costs and concluded it's in the 20B range, or others are wrong. I'd rather believe that Forbes got an insider leak about a model as it was getting readied.

We know that Turbo is quantized, at least.

And it really started even with GPT-3. We built this model. And I actually did the initial productionization of it. And so you have this research model that takes all these GPUs. We compressed it down to basically end up running on one machine. So that was effectively a 10 to 20x reduction in footprint. And then over time, we've just been improving inference technology on a lot of angles. And so we do a lot of quantization, we do a lot of, honestly, there's a lot of just like systems work because you have all these requests coming in, you need to batch them together efficiently.

[–] ttkciar@alien.top 1 points 1 year ago (2 children)

Perhaps someone heard "10x reduction in footprint" and didn't realize that meant a reduction in bytes, not a reduction in parameters, and concluded it had a tenth as many parameters?

[–] Tight_Range_5690@alien.top 1 points 1 year ago

looking at huggingface models, a raw 20b is ~42gb, not a lot of space to fit big model quants. Q4KM of 70b llama fits in that (q2 is 30gb). and the smallest falcon 180b quantization is 74gb

that would make more sense while still being really impressive. not sure if someone wants to math it out, but what's the biggest B model that would fit in that on the lowest quants (q2-q3)?

disclaimer: bees are not everything, maybe they have great dataset/money/lies

[–] ambient_temp_xeno@alien.top 1 points 1 year ago (1 children)

So they, as big-shot microsoft scientists, just decided that was good enough to stick it in a table in their paper?

[–] 2muchnet42day@alien.top 1 points 1 year ago

"Yes, we made a mistake, we totally don't have direct knowledge about this"