this post was submitted on 25 Nov 2023
1 points (100.0% liked)
Machine Learning
1 readers
1 users here now
Community Rules:
- Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
- Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
- Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
- Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
We need a name for the fallacy where people call highly nonlinear algorithms with billions of parameters "just statistics", as if all they're doing is linear regression.
ChatGPT isn't AGI yet, but it is a huge leap in modeling natural language. The fact that there's some statistics involved explains neither of those two points.
Let's ask GPT4!
I dunno. The “fallacy of composition” is just made up of 3 words, and there’s not a lot that you can explain with only three words.
How... did it map oversimplification to... holistic thinking??? Saying that it's "just statistics" is wrong because "just statistics" covers some very complicated models in principle. They weren't saying that simple subsystems are incapable of generating complex behavior.
God, why do people think these things are intelligent? I guess people fall for cons all the time...
Well, thanks to quantum mechanics; pretty much all of existence is probably "just statistics".
Well, practically all interesting statistics are NONlinear regressions. Including ML. And your brain. And physics.
What a lot of people don’t understand is that linear regression can still handle non-linear relationships.
For a statistician, linear regression just means the coefficients are linear, it doesn’t mean the relationship itself is a straight line.
That’s why linear models are still incredibly powerful and are used so widely across so many fields.
Yet still limited compared to even not-very-deep NNs. If the user wants to fit a parabola with a linear regression, he pretty much has to manually add a quadratic term himself.
I think they're widely used primarily because they're widely taught in school.
We need a name for the illness that is people throwing some shoddy homebrew "L"LM at their misshapen prompts - if you can call them that - and then concluding that it's just bad imitation of speech because they keep asking their models to produce page after page of sonic fanfic. Or thinking everything is equally hallucination-prone.
Really, the moronic takes about all this are out of this world, never even mind how people all of a sudden have a very clear idea of what intelligence, among the most ambiguous and ill-defined notions we debate, entails. Except they're struggling putting this knowledge into words, it's more about the feel of it all, y'know.
I think it's a vacuous truth.
The vacuous truth is saying that AI is statistical. It certainly is, but it's also much more.
The fallacy part is to take that fact and claim about an AI algorithm, that because it's "just statistics" that it therefore cannot exhibit "true" intelligence but it's somehow faking or mimicking intelligence.
It’s not a fallacy at all. It is just statistics, combined with some very useful inductive biases. The fallacy is trying to smuggle some extra magic into the description of what it is.
Actual AGI would be able to explain something that no human has understood before. We aren’t really close to that at all. Falling back on “___ may not be AGI yet, but…” is a lot like saying “rocket ships may not be FTL yet, but…”
The fallacy is the part where you imply that humans have magic.
"An LLM is just doing statistics, therefore an LLM can't match human intellect unless you add pixie dust somewhere." Clearly the implication is that human intellect involves pixie dust somehow?
Or maybe, idk, humans are just the result of random evolutionary processes jamming together neurons into a configuration that happens to behave in a way that lets us build steam engines, and there's no fundamental reason that jamming together perceptrons can't accomplish the same thing?
LLMs might still lack something that the human brain has. Internal monologue, for example, that allows us to allocate more than fixed amount of compute per output token.
You can just give an LLM an internal monologue. It's called a scratchpad.
I'm not sure how this applies to the broader discussion, like honestly I can't tell if we're off-topic. But once you have LLMs you can implement basically everything humans can do. The only limitations I'm aware of that aren't trivial from an engineering perspective are
And the network still uses skills that it learned in a fixed-computation-per-token regime.
Sure, future versions will lift many existing limitations, but I was talking about current LLMs.
Real brains aren't perceptrons. They don't learn by back-propagation or by evaluating performance on a training set. They're not mathematical models, or even mathematical functions in any reasonable sense. This is a "god of the gaps" scenario, wherein there are a lot of things we don't understand about how real brains work, and people jump to fill in the gap with something they do understand (e.g. ML models).
Brains are absolutely mathematical functions in a very reasonable sense, and anyone who says otherwise is a crazy person
You think brains aren't turing machines? Like, you really think that? Every physical process ever studied, all of them, are turing machines. Every one. Saying that brains aren't turing machines is no different from saying that humans have souls. You're positing the existence of extra-special magic outside the realm of science just to justify your belief that humans are too special for science to ever comprehend
(By "is a turing machine" I mean that its behavior can be predicted to arbitrary accuracy by a turing machine, and so observing its behavior is mathematically equivalent to running a turing machine)
I mean, if your hypothesis is that the human brain is the product of one billion years of evolution 'searching' for a configuration of neurons and synapses that is very efficient at sampling the environment, detect any changes, and act accordingly to increase likelihood of survival, and also communicate with other such configurations in order to devise and execute more complicated plans, then that...doesn't bode very well for current AI architectures, does it? Their training sessions are incredibly weak by comparison, simply learning to predict and interpolate some sparse dataset that some human brains produced.
If by 'there's no fundamental reason we can't jam together perceptrons this way' you mean that we can always throw a bunch of them into an ever-changing virtual world, let them mutate and multiply and after some long time fish out the survivors and have them work for us, sure, but we're talking about A LOT of compute here. Our hope is that we can find some sort of shortcut, because if we truly have to do it like evolution did, it probably won't happen this side of the millenium.
We don't currently know exactly why gradient descent works to find powerful, generalizing minima
But, like, it does
The minima we can reliably find, in practice, don't just interpolate the training data. I mean, they do that, but they find compressions which seem to actually represent knowledge, in the sense that they can identify true relationships between concepts which reliably hold outside the training distribution.
I want to stress, "predict the next token" is what the models are trained to do, it is not what they learn to do. They learn deep representations and learn to deploy those representations in arbitrary contexts. They learn to predict tokens the same way a high-school student learns to fill in scantrons: the scantron is designed so that filling it out requires other more useful skills.
It's unclear if gradient descent will continue to work so unreasonably well as we try to push it farther and farther, but so long as the current paradigm holds I don't see a huge difference between human inference ability and Transformer inference ability. Number of neurons* and amount of training data seem to be the things holding LLMs back. Humans beat LLMs on both counts, but in some ways LLMs seem to outperform biology in terms of what they can learn with a given quantity of neurons/data. As for the "billions of years" issue, that's why we are using human-generated data, so they can catch up instead of starting from scratch.
I have to say, this is completely the *opposite* of what i have gotten by playing around with those models(GPT4). At no point did I got the impression that I'm dealing with something that, had you taught it all humanity knew in the early 1800s about, say, electricity and magnetism, it would have learned 'deep representations' of those concepts to a degree that it would allow it to synthesize something truly novel, like prediction of electromagnetic waves.
I mean, the model has already digested most of what's written out there, what's the probability that something that has the ability to 'learn deep representations and learn to deploy those representations in arbitrary contexts' would have made zero contributions, drew zero new connections that had escaped humans, in something more serious that 'write an Avengers movie in the style of Shakespeare'? I'm not talking about something as big as electromagnetism but...something? Anything? It has 'grokked', as you say, pretty much the entirety of stack overflow, and yet I know of zero new programming techniques or design patterns or concepts it has come up with?
Why would a human level AGI need to be able to explain something that no human has understood before? That sounds more like ASI than AGI.
And the human brain is FTL then?
Statisticians use nonlinear models all the time
embeddings are statistics. they evolved from linear models of statistics but they are now non-linear models of statistics. Bengio 03 explains this
ChatGPT predicts the most probable next token, or the next token that yields the highest probability of a thumbs up, depending on whether you're talking about the semi-supervised learning or the reinforcement learning stage of training. That is the conceptual underpinning of how the parameter updates are calculated. It only achieves the ability to communicate because it was trained on text that successfully communicates.