Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 11 months ago

MODERATORS

communick@academy.garden

Bill Gates told a German newspaper that GPT5 wouldn't be much better than GPT4: "there are reasons to believe that we have reached a plateau" [N] (www.handelsblatt.com)

submitted 10 months ago by we_are_mammals@alien.top to c/machinelearning@academy.garden

130 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] gebregl@alien.top 1 points 10 months ago (24 children)

We need a name for the fallacy where people call highly nonlinear algorithms with billions of parameters "just statistics", as if all they're doing is linear regression.

ChatGPT isn't AGI yet, but it is a huge leap in modeling natural language. The fact that there's some statistics involved explains neither of those two points.

[–] venustrapsflies@alien.top 1 points 10 months ago (11 children)

It’s not a fallacy at all. It is just statistics, combined with some very useful inductive biases. The fallacy is trying to smuggle some extra magic into the description of what it is.

Actual AGI would be able to explain something that no human has understood before. We aren’t really close to that at all. Falling back on “___ may not be AGI yet, but…” is a lot like saying “rocket ships may not be FTL yet, but…”

[–] InterstitialLove@alien.top 1 points 10 months ago (8 children)

The fallacy is the part where you imply that humans have magic.

"An LLM is just doing statistics, therefore an LLM can't match human intellect unless you add pixie dust somewhere." Clearly the implication is that human intellect involves pixie dust somehow?

Or maybe, idk, humans are just the result of random evolutionary processes jamming together neurons into a configuration that happens to behave in a way that lets us build steam engines, and there's no fundamental reason that jamming together perceptrons can't accomplish the same thing?

[–] Basic-Low-323@alien.top 1 points 9 months ago (1 children)

I mean, if your hypothesis is that the human brain is the product of one billion years of evolution 'searching' for a configuration of neurons and synapses that is very efficient at sampling the environment, detect any changes, and act accordingly to increase likelihood of survival, and also communicate with other such configurations in order to devise and execute more complicated plans, then that...doesn't bode very well for current AI architectures, does it? Their training sessions are incredibly weak by comparison, simply learning to predict and interpolate some sparse dataset that some human brains produced.

If by 'there's no fundamental reason we can't jam together perceptrons this way' you mean that we can always throw a bunch of them into an ever-changing virtual world, let them mutate and multiply and after some long time fish out the survivors and have them work for us, sure, but we're talking about A LOT of compute here. Our hope is that we can find some sort of shortcut, because if we truly have to do it like evolution did, it probably won't happen this side of the millenium.

[–] InterstitialLove@alien.top 1 points 9 months ago (1 children)

We don't currently know exactly why gradient descent works to find powerful, generalizing minima

But, like, it does

The minima we can reliably find, in practice, don't just interpolate the training data. I mean, they do that, but they find compressions which seem to actually represent knowledge, in the sense that they can identify true relationships between concepts which reliably hold outside the training distribution.

I want to stress, "predict the next token" is what the models are trained to do, it is not what they learn to do. They learn deep representations and learn to deploy those representations in arbitrary contexts. They learn to predict tokens the same way a high-school student learns to fill in scantrons: the scantron is designed so that filling it out requires other more useful skills.

It's unclear if gradient descent will continue to work so unreasonably well as we try to push it farther and farther, but so long as the current paradigm holds I don't see a huge difference between human inference ability and Transformer inference ability. Number of neurons* and amount of training data seem to be the things holding LLMs back. Humans beat LLMs on both counts, but in some ways LLMs seem to outperform biology in terms of what they can learn with a given quantity of neurons/data. As for the "billions of years" issue, that's why we are using human-generated data, so they can catch up instead of starting from scratch.

By "number of neurons" I really mean something like "expressive power in some universally quantified sense." Obviously you can't directly compare perceptrons to biological neurons

[–] Basic-Low-323@alien.top 1 points 9 months ago

I have to say, this is completely the *opposite* of what i have gotten by playing around with those models(GPT4). At no point did I got the impression that I'm dealing with something that, had you taught it all humanity knew in the early 1800s about, say, electricity and magnetism, it would have learned 'deep representations' of those concepts to a degree that it would allow it to synthesize something truly novel, like prediction of electromagnetic waves.

I mean, the model has already digested most of what's written out there, what's the probability that something that has the ability to 'learn deep representations and learn to deploy those representations in arbitrary contexts' would have made zero contributions, drew zero new connections that had escaped humans, in something more serious that 'write an Avengers movie in the style of Shakespeare'? I'm not talking about something as big as electromagnetism but...something? Anything? It has 'grokked', as you say, pretty much the entirety of stack overflow, and yet I know of zero new programming techniques or design patterns or concepts it has come up with?

load more comments (6 replies)

load more comments (8 replies)

load more comments (20 replies)