this post was submitted on 23 Nov 2023
1 points (100.0% liked)
Machine Learning
1 readers
1 users here now
Community Rules:
- Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
- Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
- Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
- Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Is there anything the hoopla over openAI using deep Q-learning other than random speculation?
If anything I would guess DQN not q-learning.
But all the papers people have pointed to speculating about this hoopla just mention active learning or RL without specifics.
Yeah, so largely I think you’ve hit the nail but just in case you don’t know the fervour is a deliberately leaked project name “Q*” and the suggestion it precipitated the OpenAI board drama. Now, is this probably a tactic to keep prices high so stock sells @ the 65B valuation OAI had prior to the drama? Sure.
But it’s still fun to speculate.
Just that they have a project known as q*.
We don't even know whether it's actually an RL approach lol
it's very likely something like this: https://arxiv.org/pdf/2305.18290.pdf
Or finetuning on high quality datasets
what is the basis on which you judge it "very likely". The only information is a leaked rumor that there is something with the name "Q*". How do we get from that to DPO?