Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

Bill Gates told a German newspaper that GPT5 wouldn't be much better than GPT4: "there are reasons to believe that we have reached a plateau" [N] (www.handelsblatt.com)

submitted 2 years ago by we_are_mammals@alien.top to c/machinelearning@academy.garden

130 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] El_Minadero@alien.top 0 points 2 years ago (43 children)

I mean, everyone is just sorta ignoring the fact that no ML technique has been shown to do anything more than just mimic statistical aspects of the training set. Is statistical mimicry AGI? On some performance benchmarks, it appears better statistical mimicry does approach capabilities we associate with AGI.

I personally am quite suspicious that the best lever to pull is just giving it more parameters. Our own brains have such complicated neural/psychological circuitry for executive function, long and short term memory, types I and II thinking, "internal" dialog and visual models, and more importantly, the ability to few-shot learn the logical underpinnings of an example set. Without a fundamental change in how we train NNs or even our conception of effective NNs to begin with, we're not going to see the paradigm shift everyone's been waiting for.

[–] dragosconst@alien.top 1 points 2 years ago (1 children)

no ML technique has been shown to do anything more than just mimic statistical aspects of the training set

What? Are you familiar with the field of statistical learning? Formal frameworks for proving generalization have existed for some decades at this point. So when you look at anything pre-Deep Learning, you can definitely show that many mainstream ML models do more than just "mimic statistical aspects of the training set". Or if you want to go on some weird philosophical tangent, you can equivalently say that "mimicing statistical aspects of the training set" is enough to learn distributions, provided you use the right amount of the data and the right model.

And even for DL, which at the moment lacks a satisfying theoretical framework for generalization, it's obvious that empirically models can generalize.

[–] On_Mt_Vesuvius@alien.top 1 points 2 years ago (1 children)

From statistical learning theory, there is always some adversarial distribution where the model will fail to generalize... (no free lunch). And isn't generalization about extrapolation beyond the training distribution? So learning the training distribution itself is not generalization.

[–] dragosconst@alien.top 1 points 2 years ago

The No free lunch theorem in Machine Learning refers to the case in which the hypothesis class contains all possible classifiers in your domain (and your training set is either too small, or the domain set is infinite), and learning becomes impossible to guarantee, i.e. you have no useful bounds on generalization. When you restrict your class to something like linear classifiers, for example, you can reason about things like generalization and so on. For finite domain sets, you can even reason about the "every hypothesis" classifier, but that's not very useful in practice.

I'm not sure about your point about the training distribution. In general, you are interested in generalization on your training distribution, as that's where your train\test\validation data is sampled from. Note that overfitting your training set is not the same thing as learning your training distribution. You can think about stuff like domain adaptation, where you reason about your performance on "similar" distributions and how you might improve on that, but that's already something very different.

load more comments (41 replies)