this post was submitted on 26 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
 

Hi, you wonderful people!

Here's a thought that came to my mind: Since training LLMs involves a degree of randomness, is there potentially a way to create an architecture for LLMs (or other AI) that would be somewhat deterministic in its training instead?

What I mean is, could a theoretical architecture exist where everyone could train their own separate checkpoints on different datasets, which, after combining, would result in a checkpoint with combined learning from all these different smaller checkpoints?

What this would allow us to do is let thousands of people create their own checkpoints, which when combined would result in something greater than the individual parts themselves. And since the training process is what takes the longest in developing LLMs (or any AI), this approach would allow almost everyone to contribute their share of processing power towards creating something together.

If viable, this could have huge potential implications for Open Source Software.

I'm looking forward to hearing what all of you smart people have to say about it!

you are viewing a single comment's thread
view the rest of the comments
[–] ohmygad45@alien.top 1 points 11 months ago (1 children)

Here’s a simple intuition as to why averaging the weights of a 1+ hidden layer NN won’t work: pick a hidden layer in your model and apply a permutation matrix to its weights (along the input axis) and the inverse permutation matrix to the previous layer (along the output axis). Obviously the model is unchanged (from an input/output perspective). Repeat that N times (where N is the input dimension of the hidden layer you picked). You now have N models that are identical from an input output perspective. If you average those model weights, your hidden layer will output a constant because all its weights will be identical. This averaged weights model is obviously completely broken even though it’s the average of N “identical” (from an input / output perspective) model. QED.

[–] synthphreak@alien.top 1 points 11 months ago

Interesting. I love a good thought experiment :)

But what about the idea of bagging? As in aggregating multiple models together that have all been trained on different examples, and thus learned different things. Why is that not subject to similar criticism?