this post was submitted on 26 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
 

Hi, you wonderful people!

Here's a thought that came to my mind: Since training LLMs involves a degree of randomness, is there potentially a way to create an architecture for LLMs (or other AI) that would be somewhat deterministic in its training instead?

What I mean is, could a theoretical architecture exist where everyone could train their own separate checkpoints on different datasets, which, after combining, would result in a checkpoint with combined learning from all these different smaller checkpoints?

What this would allow us to do is let thousands of people create their own checkpoints, which when combined would result in something greater than the individual parts themselves. And since the training process is what takes the longest in developing LLMs (or any AI), this approach would allow almost everyone to contribute their share of processing power towards creating something together.

If viable, this could have huge potential implications for Open Source Software.

I'm looking forward to hearing what all of you smart people have to say about it!

you are viewing a single comment's thread
view the rest of the comments
[–] bitemenow999@alien.top 1 points 11 months ago

Not sure if that is possible but even if it is possible it will be very inefficient given each model would have to "learn from scratch" like learning grammar, sentence positioning, etc. You could potentially make a central model that is pre-trained on a large corpus dataset and fine-tuned for a specific task but then it is just a standard GPT-like model, which when aggregated like a mixture of experts or ensemble can potentially do what you are saying.

Combining multiple models that are trained independently into one huge model is not possible because the model learns something different for each task and due to the inherently stochastic nature of general LLM (which is desirable to aggregate information), unless you are just looking to purely "retrieve information" what you say is not possible with the current standard training regime.