Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[Discussion] Let's speak theory. Exploring the Potential of Collaborative Training? (alien.top)

submitted 2 years ago by paryska99@alien.top to c/machinelearning@academy.garden

9 comments fedilink hide all child comments

Hi, you wonderful people!

Here's a thought that came to my mind: Since training LLMs involves a degree of randomness, is there potentially a way to create an architecture for LLMs (or other AI) that would be somewhat deterministic in its training instead?

What I mean is, could a theoretical architecture exist where everyone could train their own separate checkpoints on different datasets, which, after combining, would result in a checkpoint with combined learning from all these different smaller checkpoints?

What this would allow us to do is let thousands of people create their own checkpoints, which when combined would result in something greater than the individual parts themselves. And since the training process is what takes the longest in developing LLMs (or any AI), this approach would allow almost everyone to contribute their share of processing power towards creating something together.

If viable, this could have huge potential implications for Open Source Software.

I'm looking forward to hearing what all of you smart people have to say about it!

you are viewing a single comment's thread
view the rest of the comments

[–] bitemenow999@alien.top 1 points 2 years ago

Not sure if that is possible but even if it is possible it will be very inefficient given each model would have to "learn from scratch" like learning grammar, sentence positioning, etc. You could potentially make a central model that is pre-trained on a large corpus dataset and fine-tuned for a specific task but then it is just a standard GPT-like model, which when aggregated like a mixture of experts or ensemble can potentially do what you are saying.

Combining multiple models that are trained independently into one huge model is not possible because the model learns something different for each task and due to the inherently stochastic nature of general LLM (which is desirable to aggregate information), unless you are just looking to purely "retrieve information" what you say is not possible with the current standard training regime.