You certainly can combine all the tasks and datasets into a single instruction fine tuning dataset. Then you would have a separate dataset for the reinforcement learning half where the model is learning human preferences.
FallMindless3563
joined 2 years ago
I’d like to think we dove deep, but let me know!
The only book he explicitly mentions is "Thinking Fast and Slow" by Daniel Kahneman, but I think there are a ton of books that would be great resources along side the papers. I just happened to pull a lot of the papers from the footnotes and concepts he mentioned.