LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

Why can't we just run local reinforcement learning? (alien.top)

submitted 1 year ago by Revolutionalredstone@alien.top to c/localllama@poweruser.forum

10 comments fedilink hide all child comments

Obviously building a big high dimensional language model is hard yes okay.

But once we have one can't we just jiggle weights and run tests? why can't I just download a program to "evolve" my language model?

"Am I just stupid and this is just too trivially easy to be a program?"

peace

you are viewing a single comment's thread
view the rest of the comments

[–] ShengrenR@alien.top 1 points 1 year ago (1 children)

There are a few important things here:
1 - You CAN do this. you CAN just go in to the network, modify a random value.. but then how do you evaluate if your change made the network 'better' or 'worse' - you'd have to run just about literally every possibility through that would touch 'that value' that you changed to see how the overall effect was 'better' or 'worse' - this isn't really 'reinforcement learning' though because your modified 'jiggle' is the network change.. RL will look at a string of actions, see if an action was beneficial.. then compute the required changes to that network to take into account that action being 'good' vs 'bad. So, this isn't really a description of what RL is, but it's an interesting idea.. the catch being that it's prohibitively expensive to evaluate what changes are beneficial vs a hindrance.

2- So, rather than trying to jiggle the value stored in the network and compute the overall changes across all potential outcomes.. Lets just look at individual outcomes from the current network.. see how they compare against a new 'truth' input we want to emulate (a string of token inputs).. then do that against a ton of different input strings. OH. That's what the pretraining already is :) and the difference between what my model thought the next token should be, vs what the true token in our comparison string is 'perplexity' and that's how they train the initial foundation model.

3- RL would be generating an entire output generation and then feeding a 'yes/no' signal back into the network to encourage/discourage that overall output; This is what openai does after they've released their initial model and run 'RLHF' on the model given human feedback from users. The issue here, for you 'at home,' is that unless you're the level of mad scientist that can automate evaluation of 'better vs worse' without a human head there to say yes/no, then you need to be the head that evaluates yes/no for each output.. and to move the model very far without dramatically overshooting something useful.. you have to move the model in very, very small steps.. which means you, yourself, need to sit around and say yes/no to a LOT of outputs. Would be nice to get a lot of friends in on that input to try to guide the thing so you didn't have to spend five thousand years just working on that yes/no to move the thing along.. https://github.com/LAION-AI/Open-Assistant oh hey, some folks started doing just that.

[–] Revolutionalredstone@alien.top 1 points 1 year ago

WOW

amazing info thank you kidly my dude!

Gonna be reading this for a while..