paryska99

joined 10 months ago
 

Hi, you wonderful people!

Here's a thought that came to my mind: Since training LLMs involves a degree of randomness, is there potentially a way to create an architecture for LLMs (or other AI) that would be somewhat deterministic in its training instead?

What I mean is, could a theoretical architecture exist where everyone could train their own separate checkpoints on different datasets, which, after combining, would result in a checkpoint with combined learning from all these different smaller checkpoints?

What this would allow us to do is let thousands of people create their own checkpoints, which when combined would result in something greater than the individual parts themselves. And since the training process is what takes the longest in developing LLMs (or any AI), this approach would allow almost everyone to contribute their share of processing power towards creating something together.

If viable, this could have huge potential implications for Open Source Software.

I'm looking forward to hearing what all of you smart people have to say about it!

[–] paryska99@alien.top 1 points 9 months ago (1 children)

Would it be possible to create a system where every model's training includes a specific set seed and records its exact state, and then share this information with the dataset it was trained on to ensure we can reproduce the training? This method could help manage the randomness in training.

Using a set seed means we can make sure that the way the model starts and how it learns during training is the same every time. Essentially, if we restart the training from a certain point with this seed, the model should learn in the same way it did before. Also, by saving and sharing details like the model's structure, which training stage it's in, and the training step, along with the seed, we're essentially taking a 'snapshot' of where the model is at that moment.

Others could use this snapshot to pick up the training right where it was left off, under the same conditions. For merging different models, this technique could help line up how they learn, making it easier and more predictable to combine their training.

Am I thinking right about this or am I missing something? This is just theoretical thinking and I am not an expert on the subject.

 

Hi, you wonderful people!

Here's a thought that came to my mind: Since training LLMs involves a degree of randomness, is there potentially a way to create an architecture for LLMs (or other AI) that would be somewhat deterministic in its training instead?

What I mean is, could a theoretical architecture exist where everyone could train their own separate checkpoints on different datasets, which, after combining, would result in a checkpoint with combined learning from all these different smaller checkpoints?

What this would allow us to do is let thousands of people create their own checkpoints, which when combined would result in something greater than the individual parts themselves. And since the training process is what takes the longest in developing LLMs (or any AI), this approach would allow almost everyone to contribute their share of processing power towards creating something together.

If viable, this could have huge potential implications for Open Source Software.

I'm looking forward to hearing what all of you smart people have to say about it!

[–] paryska99@alien.top 1 points 9 months ago (1 children)

Doesn't the LlamaCpp server host a GUI for multimodal? You could potentially visit it, open the developer panel in your browser, and observe the HTTP requests being sent.

[–] paryska99@alien.top 1 points 9 months ago (1 children)

Thanks for the input.

What inference engine did you use? It's possibly a bug as these things tend to happen with the new models.
I for one can't wait for the lookahead decoding in llamacpp and others, combine that with some smaller models and we'll have blazing fast speeds on pennies worth of hardware from what i recon.

[–] paryska99@alien.top 1 points 9 months ago (3 children)

There is new rocket 3b that might be worth a try. It's suspiciously high in benchmarks so i suspect contamination of the dataset, but I saw people have good experience with it.

[–] paryska99@alien.top 1 points 10 months ago

Future is going to be interesting. With this kind of CPU speedup we can run blazing fast LLMs on a toaster if it has enough RAM.

[–] paryska99@alien.top 1 points 10 months ago

Oh wow, this seems almost too good to be true

[–] paryska99@alien.top 1 points 10 months ago

Oh wow, I know the results are probably cherry picked, but this still seems like such a step-up.

[–] paryska99@alien.top 1 points 10 months ago

Yes! I've been waiting for progress in video for a while! Imagine dyi automated classification for the sake of compilations and edits. This is going to be sick! Can't wait and see an implementation on llamacpp

 

Is any of you planning to finetune or aware of finutunes of openchat models coming out anytime soon?

Considering the new OpenChat-3.5 7B seems to perform similarly or better than mistral 7B you'd think there would be tons of finetunes just like there are for the mistral base.

Is there any reason for it?

[–] paryska99@alien.top 1 points 10 months ago

I can't wait to see some finetunes of openchat-3.5. this thing is way too smart for a 7b. Frankly I am amazed at how fast we went from 7b can't keep it togheter to "this 7b is pretty much on par with chatgpt-3.5" (for a lot of use cases at least)

[–] paryska99@alien.top 1 points 10 months ago

I hope we can get quantized gguf soon from the legendary TheBloke

[–] paryska99@alien.top 1 points 10 months ago

Also i'd give the new openchat 3.5, if the benchmarks are indeed correct then it's the best 7B model so far (altough there are so many of them that i might be wrong, but it's better than base mistral 7B)

[–] paryska99@alien.top 1 points 10 months ago (1 children)

I know these benchmarks are a tough topic, but this on paper looks really impressive. It states to be better than mistral and I loved the progress mistral brought. If someone tries this model out can you give feedback under this post? Much appreciated

view more: next ›