this post was submitted on 13 Nov 2023
1 points (100.0% liked)
LocalLLaMA
1 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 10 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I have been generating art with AI. There is an extension meant for exactly that : you literally tell the AI "good" or "bad" for each result, and it affects the weights of the model.
Sadly, it's sheer impossible to run. Reinforcement learning isn't just about "picking a random weight and changing them". It's rewriting the entire model to take your feedback into account. And that, while running the model, which in itself already takes most of your compute resource.
You need a shitton of VRAM and a very powerful GPU to run Reinforcement Learning for images. It's even worse for LLMs, which are much more power-hungry.
Who knows, maybe there will be optimizations in the next years, but as of right now, reinforcement learning is just too demanding.
How hard can it be?
Seriously though, what makes it require more VRAM than regular inference? You're still loading the same model, aren't you?
Well, first of all, this is something you do while running the model. Sure, it's the same model, but it's still two different processes to run in parallel.
Then, from what I gather, it's closer to model finetuning than it is to inference. And if you look up the figures, finetune requires a lot more power and VRAM. As I said, it's rewriting the neural network, which is the definition of finetuning.
So in order to get a more specific answer, we should look up why finetuning requires more than inference.