Warm_Shelter1866

joined 11 months ago
[–] Warm_Shelter1866@alien.top 1 points 11 months ago

What does it mean that an LLM is a reward model ? , I always thought of rewards only in the RL field . And how would the reward model be used during finetuning?