this post was submitted on 23 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
 

OpenAI's approach to Q-Learning has been drawing significant attention recently.

However, there's a fundamental issue in the way Q-learning is typically implemented in deep learning and neural network environments. This concern is highlighted in the award-winning paper "Non-delusional Q-learning," presented at NeurIPS.

The paper suggests a fundamental flaw in the blind application of Q-learning updates to deep neural networks. It points out that such updates can create a self-contradictory scenario where improving the network for the current batch of data inadvertently makes it less effective for other batches. This is akin to a situation in supervised learning where optimizing a network for a specific set of data may degrade its performance on other datasets.

For more insights, the full paper can be accessed here: Non-delusional Q-learning Paper(Follow up ICML paper: Practical Non-delusional-Q Learning )

I'm curious about others' views on this topic. What do you think about the implications of these findings for the future of Q-learning in deep learning environments?

you are viewing a single comment's thread
view the rest of the comments
[–] Red-Portal@alien.top 1 points 11 months ago (1 children)

We don't even know whether it's actually an RL approach lol

[–] pm_me_your_pay_slips@alien.top 1 points 11 months ago (1 children)

it's very likely something like this: https://arxiv.org/pdf/2305.18290.pdf

Or finetuning on high quality datasets

[–] cthorrez@alien.top 1 points 11 months ago

what is the basis on which you judge it "very likely". The only information is a leaked rumor that there is something with the name "Q*". How do we get from that to DPO?