this post was submitted on 25 Nov 2023
1 points (100.0% liked)
Machine Learning
1 readers
1 users here now
Community Rules:
- Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
- Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
- Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
- Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I mean, everyone is just sorta ignoring the fact that no ML technique has been shown to do anything more than just mimic statistical aspects of the training set. Is statistical mimicry AGI? On some performance benchmarks, it appears better statistical mimicry does approach capabilities we associate with AGI.
I personally am quite suspicious that the best lever to pull is just giving it more parameters. Our own brains have such complicated neural/psychological circuitry for executive function, long and short term memory, types I and II thinking, "internal" dialog and visual models, and more importantly, the ability to few-shot learn the logical underpinnings of an example set. Without a fundamental change in how we train NNs or even our conception of effective NNs to begin with, we're not going to see the paradigm shift everyone's been waiting for.
Actually, the claim that "all ML models are doing is statistics" has proven to be a fallacy that dominated the field of AI for a long time.
See this video for instance, where Ilya (probably the #1 AI researcher in the world currently) explains how GPT is much more than statistics, it is more akin to "compression" and that can lead to intelligence: https://www.youtube.com/watch?v=GI4Tpi48DlA (4.30 - 7.30)
See the trouble with the Turing test is that the linguistic capabilities of the most sophisticated models well exceed those of the dumbest humans.
I think we can just call the Turing test passed in this case.
The Turing test was passed in the 60s by rules based systems. It's not a great test.
Is ChatGPT Passing the Turing Test Really Important? https://youtu.be/wdCzGwQv4rI
I don’t think so. I doubt GPT-4 will be able to convince someone who is trying to determine whether or not the if the think they are talking to is a human.
There's literally been a website you could go on that opens a chat with either a human or GPT, but you do not know which one, and then you get like 30s to figure it out by chatting with them. Then you need to guess if it was a human or an AI you just talked to. And people get it wrong all the time.
Edit: link to the research that came from that https://www.ai21.com/blog/human-or-not-results
I think you have to use a reasonably smart human as a baseline, otherwise literally any computer is AGI. Babbage's Analytical Engine from 1830 was more intelligent than a human in a coma.
Ironically for robots and the like to truly be accepted, they will have to be coded to make mistakes to seem more human.
i kinda agree. the turing should take accuracy and wisdom into account. gpt4 is, much like how gpt3.5 was, very confidently wrong some times. the code or advice it could be giving you could be technically true, but very very stupid to do in practice.
“Very confidently wrong sometimes” is how I would describe most of humanity.
That's ok when the agent creates its own training set, like AlphaZero. It is learning from feedback as opposed to learning from next token prediction.
How do you think your brain works ? You think you have magic or it is mostly automatism learned by human learning and now you simply do inference on your training (childhood)
We need a name for the fallacy where people call highly nonlinear algorithms with billions of parameters "just statistics", as if all they're doing is linear regression.
ChatGPT isn't AGI yet, but it is a huge leap in modeling natural language. The fact that there's some statistics involved explains neither of those two points.
Well, thanks to quantum mechanics; pretty much all of existence is probably "just statistics".
Well, practically all interesting statistics are NONlinear regressions. Including ML. And your brain. And physics.
What a lot of people don’t understand is that linear regression can still handle non-linear relationships.
For a statistician, linear regression just means the coefficients are linear, it doesn’t mean the relationship itself is a straight line.
That’s why linear models are still incredibly powerful and are used so widely across so many fields.
Yet still limited compared to even not-very-deep NNs. If the user wants to fit a parabola with a linear regression, he pretty much has to manually add a quadratic term himself.
I think they're widely used primarily because they're widely taught in school.
We need a name for the illness that is people throwing some shoddy homebrew "L"LM at their misshapen prompts - if you can call them that - and then concluding that it's just bad imitation of speech because they keep asking their models to produce page after page of sonic fanfic. Or thinking everything is equally hallucination-prone.
Really, the moronic takes about all this are out of this world, never even mind how people all of a sudden have a very clear idea of what intelligence, among the most ambiguous and ill-defined notions we debate, entails. Except they're struggling putting this knowledge into words, it's more about the feel of it all, y'know.
Let's ask GPT4!
I dunno. The “fallacy of composition” is just made up of 3 words, and there’s not a lot that you can explain with only three words.
How... did it map oversimplification to... holistic thinking??? Saying that it's "just statistics" is wrong because "just statistics" covers some very complicated models in principle. They weren't saying that simple subsystems are incapable of generating complex behavior.
God, why do people think these things are intelligent? I guess people fall for cons all the time...
I think it's a vacuous truth.
The vacuous truth is saying that AI is statistical. It certainly is, but it's also much more.
The fallacy part is to take that fact and claim about an AI algorithm, that because it's "just statistics" that it therefore cannot exhibit "true" intelligence but it's somehow faking or mimicking intelligence.
It’s not a fallacy at all. It is just statistics, combined with some very useful inductive biases. The fallacy is trying to smuggle some extra magic into the description of what it is.
Actual AGI would be able to explain something that no human has understood before. We aren’t really close to that at all. Falling back on “___ may not be AGI yet, but…” is a lot like saying “rocket ships may not be FTL yet, but…”
The fallacy is the part where you imply that humans have magic.
"An LLM is just doing statistics, therefore an LLM can't match human intellect unless you add pixie dust somewhere." Clearly the implication is that human intellect involves pixie dust somehow?
Or maybe, idk, humans are just the result of random evolutionary processes jamming together neurons into a configuration that happens to behave in a way that lets us build steam engines, and there's no fundamental reason that jamming together perceptrons can't accomplish the same thing?
LLMs might still lack something that the human brain has. Internal monologue, for example, that allows us to allocate more than fixed amount of compute per output token.
You can just give an LLM an internal monologue. It's called a scratchpad.
I'm not sure how this applies to the broader discussion, like honestly I can't tell if we're off-topic. But once you have LLMs you can implement basically everything humans can do. The only limitations I'm aware of that aren't trivial from an engineering perspective are
And the network still uses skills that it learned in a fixed-computation-per-token regime.
Sure, future versions will lift many existing limitations, but I was talking about current LLMs.
Real brains aren't perceptrons. They don't learn by back-propagation or by evaluating performance on a training set. They're not mathematical models, or even mathematical functions in any reasonable sense. This is a "god of the gaps" scenario, wherein there are a lot of things we don't understand about how real brains work, and people jump to fill in the gap with something they do understand (e.g. ML models).
Brains are absolutely mathematical functions in a very reasonable sense, and anyone who says otherwise is a crazy person
You think brains aren't turing machines? Like, you really think that? Every physical process ever studied, all of them, are turing machines. Every one. Saying that brains aren't turing machines is no different from saying that humans have souls. You're positing the existence of extra-special magic outside the realm of science just to justify your belief that humans are too special for science to ever comprehend
(By "is a turing machine" I mean that its behavior can be predicted to arbitrary accuracy by a turing machine, and so observing its behavior is mathematically equivalent to running a turing machine)
I mean, if your hypothesis is that the human brain is the product of one billion years of evolution 'searching' for a configuration of neurons and synapses that is very efficient at sampling the environment, detect any changes, and act accordingly to increase likelihood of survival, and also communicate with other such configurations in order to devise and execute more complicated plans, then that...doesn't bode very well for current AI architectures, does it? Their training sessions are incredibly weak by comparison, simply learning to predict and interpolate some sparse dataset that some human brains produced.
If by 'there's no fundamental reason we can't jam together perceptrons this way' you mean that we can always throw a bunch of them into an ever-changing virtual world, let them mutate and multiply and after some long time fish out the survivors and have them work for us, sure, but we're talking about A LOT of compute here. Our hope is that we can find some sort of shortcut, because if we truly have to do it like evolution did, it probably won't happen this side of the millenium.
We don't currently know exactly why gradient descent works to find powerful, generalizing minima
But, like, it does
The minima we can reliably find, in practice, don't just interpolate the training data. I mean, they do that, but they find compressions which seem to actually represent knowledge, in the sense that they can identify true relationships between concepts which reliably hold outside the training distribution.
I want to stress, "predict the next token" is what the models are trained to do, it is not what they learn to do. They learn deep representations and learn to deploy those representations in arbitrary contexts. They learn to predict tokens the same way a high-school student learns to fill in scantrons: the scantron is designed so that filling it out requires other more useful skills.
It's unclear if gradient descent will continue to work so unreasonably well as we try to push it farther and farther, but so long as the current paradigm holds I don't see a huge difference between human inference ability and Transformer inference ability. Number of neurons* and amount of training data seem to be the things holding LLMs back. Humans beat LLMs on both counts, but in some ways LLMs seem to outperform biology in terms of what they can learn with a given quantity of neurons/data. As for the "billions of years" issue, that's why we are using human-generated data, so they can catch up instead of starting from scratch.
I have to say, this is completely the *opposite* of what i have gotten by playing around with those models(GPT4). At no point did I got the impression that I'm dealing with something that, had you taught it all humanity knew in the early 1800s about, say, electricity and magnetism, it would have learned 'deep representations' of those concepts to a degree that it would allow it to synthesize something truly novel, like prediction of electromagnetic waves.
I mean, the model has already digested most of what's written out there, what's the probability that something that has the ability to 'learn deep representations and learn to deploy those representations in arbitrary contexts' would have made zero contributions, drew zero new connections that had escaped humans, in something more serious that 'write an Avengers movie in the style of Shakespeare'? I'm not talking about something as big as electromagnetism but...something? Anything? It has 'grokked', as you say, pretty much the entirety of stack overflow, and yet I know of zero new programming techniques or design patterns or concepts it has come up with?
Why would a human level AGI need to be able to explain something that no human has understood before? That sounds more like ASI than AGI.
And the human brain is FTL then?
Statisticians use nonlinear models all the time
embeddings are statistics. they evolved from linear models of statistics but they are now non-linear models of statistics. Bengio 03 explains this
ChatGPT predicts the most probable next token, or the next token that yields the highest probability of a thumbs up, depending on whether you're talking about the semi-supervised learning or the reinforcement learning stage of training. That is the conceptual underpinning of how the parameter updates are calculated. It only achieves the ability to communicate because it was trained on text that successfully communicates.
What? I hope you're talking about LLMs exclusively because otherwise this is just blatantly false. AlphaGo Zero is just one of many such examples.
What? Are you familiar with the field of statistical learning? Formal frameworks for proving generalization have existed for some decades at this point. So when you look at anything pre-Deep Learning, you can definitely show that many mainstream ML models do more than just "mimic statistical aspects of the training set". Or if you want to go on some weird philosophical tangent, you can equivalently say that "mimicing statistical aspects of the training set" is enough to learn distributions, provided you use the right amount of the data and the right model.
And even for DL, which at the moment lacks a satisfying theoretical framework for generalization, it's obvious that empirically models can generalize.
From statistical learning theory, there is always some adversarial distribution where the model will fail to generalize... (no free lunch). And isn't generalization about extrapolation beyond the training distribution? So learning the training distribution itself is not generalization.
The No free lunch theorem in Machine Learning refers to the case in which the hypothesis class contains all possible classifiers in your domain (and your training set is either too small, or the domain set is infinite), and learning becomes impossible to guarantee, i.e. you have no useful bounds on generalization. When you restrict your class to something like linear classifiers, for example, you can reason about things like generalization and so on. For finite domain sets, you can even reason about the "every hypothesis" classifier, but that's not very useful in practice.
I'm not sure about your point about the training distribution. In general, you are interested in generalization on your training distribution, as that's where your train\test\validation data is sampled from. Note that overfitting your training set is not the same thing as learning your training distribution. You can think about stuff like domain adaptation, where you reason about your performance on "similar" distributions and how you might improve on that, but that's already something very different.
But given the diversity of the training data at the scale of hundreds of trillions of tokens, you can expect the model to cover almost all of the tasks we care to do.
Reinforcement learning does far more than mimic.