overview for paryska99

1

Let's speak theory. Exploring the Potential of Collaborative Training? (alien.top)

submitted 2 years ago by paryska99@alien.top to c/localllama@poweruser.forum

0 comments fedilink

Hi, you wonderful people!

Here's a thought that came to my mind: Since training LLMs involves a degree of randomness, is there potentially a way to create an architecture for LLMs (or other AI) that would be somewhat deterministic in its training instead?

What I mean is, could a theoretical architecture exist where everyone could train their own separate checkpoints on different datasets, which, after combining, would result in a checkpoint with combined learning from all these different smaller checkpoints?

What this would allow us to do is let thousands of people create their own checkpoints, which when combined would result in something greater than the individual parts themselves. And since the training process is what takes the longest in developing LLMs (or any AI), this approach would allow almost everyone to contribute their share of processing power towards creating something together.

If viable, this could have huge potential implications for Open Source Software.

I'm looking forward to hearing what all of you smart people have to say about it!

[Discussion] Let's speak theory. Exploring the Potential of Collaborative Training? in c/machinelearning@academy.garden

[–] paryska99@alien.top 1 points 2 years ago (1 children)

Would it be possible to create a system where every model's training includes a specific set seed and records its exact state, and then share this information with the dataset it was trained on to ensure we can reproduce the training? This method could help manage the randomness in training.

Using a set seed means we can make sure that the way the model starts and how it learns during training is the same every time. Essentially, if we restart the training from a certain point with this seed, the model should learn in the same way it did before. Also, by saving and sharing details like the model's structure, which training stage it's in, and the training step, along with the seed, we're essentially taking a 'snapshot' of where the model is at that moment.

Others could use this snapshot to pick up the training right where it was left off, under the same conditions. For merging different models, this technique could help line up how they learn, making it easier and more predictable to combine their training.

Am I thinking right about this or am I missing something? This is just theoretical thinking and I am not an expert on the subject.

1

[Discussion] Let's speak theory. Exploring the Potential of Collaborative Training? (alien.top)

submitted 2 years ago by paryska99@alien.top to c/machinelearning@academy.garden

9 comments fedilink

Hi, you wonderful people!

Here's a thought that came to my mind: Since training LLMs involves a degree of randomness, is there potentially a way to create an architecture for LLMs (or other AI) that would be somewhat deterministic in its training instead?

What I mean is, could a theoretical architecture exist where everyone could train their own separate checkpoints on different datasets, which, after combining, would result in a checkpoint with combined learning from all these different smaller checkpoints?

What this would allow us to do is let thousands of people create their own checkpoints, which when combined would result in something greater than the individual parts themselves. And since the training process is what takes the longest in developing LLMs (or any AI), this approach would allow almost everyone to contribute their share of processing power towards creating something together.

If viable, this could have huge potential implications for Open Source Software.

I'm looking forward to hearing what all of you smart people have to say about it!

struggling to include text prompts along with image-data (multimodal) for inferencing in c/localllama@poweruser.forum

[–] paryska99@alien.top 1 points 2 years ago (1 children)

Doesn't the LlamaCpp server host a GUI for multimodal? You could potentially visit it, open the developer panel in your browser, and observe the HTTP requests being sent.

Anyone have a 1B or 3B model that is mostly coherent? in c/localllama@poweruser.forum

[–] paryska99@alien.top 1 points 2 years ago (1 children)

Thanks for the input.

What inference engine did you use? It's possibly a bug as these things tend to happen with the new models.
I for one can't wait for the lookahead decoding in llamacpp and others, combine that with some smaller models and we'll have blazing fast speeds on pennies worth of hardware from what i recon.

Anyone have a 1B or 3B model that is mostly coherent? in c/localllama@poweruser.forum

[–] paryska99@alien.top 1 points 2 years ago (3 children)

There is new rocket 3b that might be worth a try. It's suspiciously high in benchmarks so i suspect contamination of the dataset, but I saw people have good experience with it.

40x or more speedup by selecting important neurons in c/localllama@poweruser.forum

[–] paryska99@alien.top 1 points 2 years ago

Future is going to be interesting. With this kind of CPU speedup we can run blazing fast LLMs on a toaster if it has enough RAM.

Rocket 🦝 - smol model that overcomes models much larger in size in c/localllama@poweruser.forum

[–] paryska99@alien.top 1 points 2 years ago

Oh wow, this seems almost too good to be true

Stable Diffusion - Video - New models ! in c/localllama@poweruser.forum

[–] paryska99@alien.top 1 points 2 years ago

Oh wow, I know the results are probably cherry picked, but this still seems like such a step-up.

Video-LLaVA can describe both image and video input. in c/localllama@poweruser.forum

[–] paryska99@alien.top 1 points 2 years ago

Yes! I've been waiting for progress in video for a while! Imagine dyi automated classification for the sake of compilations and edits. This is going to be sick! Can't wait and see an implementation on llamacpp

1

OpenChat finetunes? (alien.top)

submitted 2 years ago by paryska99@alien.top to c/localllama@poweruser.forum

1 comments fedilink

Is any of you planning to finetune or aware of finutunes of openchat models coming out anytime soon?

Considering the new OpenChat-3.5 7B seems to perform similarly or better than mistral 7B you'd think there would be tons of finetunes just like there are for the mistral base.

Is there any reason for it?

What base model do you use for fine-tuning? in c/localllama@poweruser.forum

[–] paryska99@alien.top 1 points 2 years ago

I can't wait to see some finetunes of openchat-3.5. this thing is way too smart for a 7b. Frankly I am amazed at how fast we went from 7b can't keep it togheter to "this 7b is pretty much on par with chatgpt-3.5" (for a lot of use cases at least)

Polanka 7b - Polish LLM based on Mistral in c/localllama@poweruser.forum

[–] paryska99@alien.top 1 points 2 years ago

I hope we can get quantized gguf soon from the legendary TheBloke

codellama gives really bad results, am I doing something wrong? in c/localllama@poweruser.forum

[–] paryska99@alien.top 1 points 2 years ago

Also i'd give the new openchat 3.5, if the benchmarks are indeed correct then it's the best 7B model so far (altough there are so many of them that i might be wrong, but it's better than base mistral 7B)

New Model: openchat 3.5 with 16k context in c/localllama@poweruser.forum

[–] paryska99@alien.top 1 points 2 years ago (1 children)

I know these benchmarks are a tough topic, but this on paper looks really impressive. It states to be better than mistral and I loved the progress mistral brought. If someone tries this model out can you give feedback under this post? Much appreciated