this post was submitted on 22 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

According to this tweet,

when gpt4 first finished training it didn’t actually work very well and the whole team thought it’s over, scaling is dead…until greg went into a cave for weeks and somehow magically made it work

So gpt-4 was kind of broken at first. Then greg spent a few weeks trying to fix it and then it somehow worked.

So why did it not work at first and how did they fix it?
I think this is an important question to the OSS community,

top 11 comments
sorted by: hot top controversial new old
[–] maxinator80@alien.top 1 points 11 months ago (1 children)

Sam Altman mentioned that GPT4 is actually super difficult to work with. So I guess it simply isn't as straight forward as pushing in a prompt at the front and getting tokens out the back. Anything further would be speculation, but there must be something.

[–] CosmosisQ@alien.top 1 points 11 months ago

He's just alluding to the fact that most enterprise customers are too stupid to use base models as they expect to be interacting with a human-like dialogue-driven agent or chatbot rather than a supercharged text completion engine. It's a shame given that, used properly, the GPT-4 base model is far superior to the lobotomized version made generally available through the API.

[–] wojtek15@alien.top 1 points 11 months ago (1 children)

According to https://openai.com/research/gpt-4 they were able to predict GPT4 performance while still training it, so this is contradiction to this tweet.

[–] dogesator@alien.top 1 points 11 months ago

Predicting the loss is very different from predicting real world abilities, they are able to top the former, not the latter.

Predicting the future loss once you’re already 10% into training is fairly trivial. Predicting the actual abilities though is not.

[–] maizeq@alien.top 1 points 11 months ago

Much more likely: the story is apocryphal, or is at least, highly exaggerated.

[–] FormerIYI@alien.top 1 points 11 months ago

Maybe papers from Pangu-Sigma or other large scale MoE models can be helpfulhttps://arxiv.org/abs/2303.10845

[–] amplifizzle@alien.top 1 points 11 months ago

Also, what is the location and Wi-Fi connectivity of the cave?

[–] ReMeDyIII@alien.top 1 points 11 months ago

It's like when Tony Stark was in a cave and made a prototype Ironman suit.

[–] Adept-Upstairs-7934@alien.top 1 points 11 months ago

They shot a text to Jensen over at Nvidia and he gave them is contact from a few galaxies over. And the nice beings walked them through it.

[–] troposfer@alien.top 1 points 11 months ago

These are stories, 1 man solve it all , just the right timing by the way with all this opanai saga

[–] PotaroMax@alien.top 1 points 11 months ago

and greg found it without asking to gpt4