this post was submitted on 26 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 10 months ago
MODERATORS
 

Hi, you wonderful people!

Here's a thought that came to my mind: Since training LLMs involves a degree of randomness, is there potentially a way to create an architecture for LLMs (or other AI) that would be somewhat deterministic in its training instead?

What I mean is, could a theoretical architecture exist where everyone could train their own separate checkpoints on different datasets, which, after combining, would result in a checkpoint with combined learning from all these different smaller checkpoints?

What this would allow us to do is let thousands of people create their own checkpoints, which when combined would result in something greater than the individual parts themselves. And since the training process is what takes the longest in developing LLMs (or any AI), this approach would allow almost everyone to contribute their share of processing power towards creating something together.

If viable, this could have huge potential implications for Open Source Software.

I'm looking forward to hearing what all of you smart people have to say about it!

you are viewing a single comment's thread
view the rest of the comments
[–] ohmygad45@alien.top 1 points 9 months ago (5 children)

I’m not aware of any way to accomplish what you’re describing besides those you’ve ruled out (federated learning and mixtures of experts). Naively averaging weights of models trained on disjoint datasets won’t work for LLMs or 1+ hidden layer DNNs (though it will for logistic or linear models). This sounds to me like an open research question.

[–] paryska99@alien.top 1 points 9 months ago (1 children)

Would it be possible to create a system where every model's training includes a specific set seed and records its exact state, and then share this information with the dataset it was trained on to ensure we can reproduce the training? This method could help manage the randomness in training.

Using a set seed means we can make sure that the way the model starts and how it learns during training is the same every time. Essentially, if we restart the training from a certain point with this seed, the model should learn in the same way it did before. Also, by saving and sharing details like the model's structure, which training stage it's in, and the training step, along with the seed, we're essentially taking a 'snapshot' of where the model is at that moment.

Others could use this snapshot to pick up the training right where it was left off, under the same conditions. For merging different models, this technique could help line up how they learn, making it easier and more predictable to combine their training.

Am I thinking right about this or am I missing something? This is just theoretical thinking and I am not an expert on the subject.

[–] dlowashere@alien.top 1 points 9 months ago

You could use set seeds and checkpoints to serially train a model between different models. I don’t know how you could “merge” different models that are trained independently. I think the challenge here is in the merging, not necessarily the deterministic part.

load more comments (3 replies)