Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 1 year ago

MODERATORS

communick@academy.garden

[R] Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models (alien.top)

submitted 1 year ago by the_architect_ai@alien.top to c/machinelearning@academy.garden

1 comments fedilink hide all child comments

https://arxiv.org/pdf/2311.00871.pdf

Abstract:
Transformer models, notably large language models (LLMs), have the remarkable ability to perform in-context learning (ICL) – to perform new tasks when prompted with unseen input- output examples without any explicit model training. In this work, we study how effectively transformers can bridge between their pretraining data mixture, comprised of multiple distinct task families, to identify and learn new tasks in-context which are both inside and outside the pretraining distribution. Building on previous work, we investigate this question in a controlled setting, where we study transformer models trained on sequences of (x,f(x)) pairs rather than natural language. Our empirical results show transformers demonstrate near-optimal unsu- pervised model selection capabilities, in their ability to first in-context identify different task families and in-context learn within them when the task families are well-represented in their pretraining data. However when presented with tasks or functions which are out-of-domain of their pretraining data, we demonstrate various failure modes of transformers and degrada- tion of their generalization for even simple extrapolation tasks. Together our results highlight that the impressive ICL abilities of high-capacity sequence models may be more closely tied to the coverage of their pretraining data mixtures than inductive biases that create fundamental generalization capabilities.

TLDR: AGI is not imminent as Transformers aren't good at generalising beyond training data.

top 1 comments

sorted by: hot top controversial new old

[–] CatalyzeX_code_bot@alien.top 1 points 1 year ago

No relevant code picked up just yet for "Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models".

Request code from the authors or ask a question.

If you have code to share with the community, please add it here 😊🙏

To opt out from receiving code links, DM me.