this post was submitted on 25 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
It’s better than that, imo, when you look at it in context.
Particularly in light of Intel’s finding the other day, that DPO works well (probably better) without preference data.
“Alignment” methods are getting simpler, easier, and more effective.
RLHF was a huge pain, because there were a ton of hyper parameters to tweak, and it’s expensive to get human data.
Constitutional AI (RLAIF) dealt with some of the cost and difficulty by using AI preference data, but still left the necessity for collecting preference data, and all the hyper parameter tweaking intact.
DPO eliminated the superfluous reward model, simplifying things greatly, and making overfitting less pernicious.
Intel got rid of preference data altogether.
IPO claims to fix overfitting altogether, while simplifying further.
I figure within a month, Axolotl will grow a flag that means, “and also IPO this,” with no additional cognitive overhead or hyper-parameter tuning required, and —yes— the water line for model quality is going to go up.