Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[Discussion] Let's speak theory. Exploring the Potential of Collaborative Training? (alien.top)

submitted 2 years ago by paryska99@alien.top to c/machinelearning@academy.garden

9 comments fedilink hide all child comments

Hi, you wonderful people!

Here's a thought that came to my mind: Since training LLMs involves a degree of randomness, is there potentially a way to create an architecture for LLMs (or other AI) that would be somewhat deterministic in its training instead?

What I mean is, could a theoretical architecture exist where everyone could train their own separate checkpoints on different datasets, which, after combining, would result in a checkpoint with combined learning from all these different smaller checkpoints?

What this would allow us to do is let thousands of people create their own checkpoints, which when combined would result in something greater than the individual parts themselves. And since the training process is what takes the longest in developing LLMs (or any AI), this approach would allow almost everyone to contribute their share of processing power towards creating something together.

If viable, this could have huge potential implications for Open Source Software.

I'm looking forward to hearing what all of you smart people have to say about it!

you are viewing a single comment's thread
view the rest of the comments

[–] ohmygad45@alien.top 1 points 2 years ago (5 children)

I’m not aware of any way to accomplish what you’re describing besides those you’ve ruled out (federated learning and mixtures of experts). Naively averaging weights of models trained on disjoint datasets won’t work for LLMs or 1+ hidden layer DNNs (though it will for logistic or linear models). This sounds to me like an open research question.

[–] paryska99@alien.top 1 points 2 years ago (1 children)

Would it be possible to create a system where every model's training includes a specific set seed and records its exact state, and then share this information with the dataset it was trained on to ensure we can reproduce the training? This method could help manage the randomness in training.

Using a set seed means we can make sure that the way the model starts and how it learns during training is the same every time. Essentially, if we restart the training from a certain point with this seed, the model should learn in the same way it did before. Also, by saving and sharing details like the model's structure, which training stage it's in, and the training step, along with the seed, we're essentially taking a 'snapshot' of where the model is at that moment.

Others could use this snapshot to pick up the training right where it was left off, under the same conditions. For merging different models, this technique could help line up how they learn, making it easier and more predictable to combine their training.

Am I thinking right about this or am I missing something? This is just theoretical thinking and I am not an expert on the subject.

[–] dlowashere@alien.top 1 points 2 years ago

You could use set seeds and checkpoints to serially train a model between different models. I don’t know how you could “merge” different models that are trained independently. I think the challenge here is in the merging, not necessarily the deterministic part.

load more comments (3 replies)