Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[R] Rethinking Open'sAI's Q-Learning : Insights from the Award-Winning 'Non-delusional Q-learning' Paper (alien.top)

submitted 2 years ago by Even_Campaign7385@alien.top to c/machinelearning@academy.garden

8 comments fedilink hide all child comments

OpenAI's approach to Q-Learning has been drawing significant attention recently.

However, there's a fundamental issue in the way Q-learning is typically implemented in deep learning and neural network environments. This concern is highlighted in the award-winning paper "Non-delusional Q-learning," presented at NeurIPS.

The paper suggests a fundamental flaw in the blind application of Q-learning updates to deep neural networks. It points out that such updates can create a self-contradictory scenario where improving the network for the current batch of data inadvertently makes it less effective for other batches. This is akin to a situation in supervised learning where optimizing a network for a specific set of data may degrade its performance on other datasets.

For more insights, the full paper can be accessed here: Non-delusional Q-learning Paper(Follow up ICML paper: Practical Non-delusional-Q Learning )

I'm curious about others' views on this topic. What do you think about the implications of these findings for the future of Q-learning in deep learning environments?

top 8 comments

sorted by: hot top controversial new old

[–] Calm-Expression5549@alien.top 1 points 2 years ago

Seems to be a good read. I never thought about Q learning has such a problem in practice.

[–] wind_dude@alien.top 1 points 2 years ago (3 children)

Is there anything the hoopla over openAI using deep Q-learning other than random speculation?

If anything I would guess DQN not q-learning.

But all the papers people have pointed to speculating about this hoopla just mention active learning or RL without specifics.

[–] Red-Portal@alien.top 1 points 2 years ago (1 children)

We don't even know whether it's actually an RL approach lol

[–] pm_me_your_pay_slips@alien.top 1 points 2 years ago (1 children)

it's very likely something like this: https://arxiv.org/pdf/2305.18290.pdf

Or finetuning on high quality datasets

[–] cthorrez@alien.top 1 points 2 years ago

what is the basis on which you judge it "very likely". The only information is a leaked rumor that there is something with the name "Q*". How do we get from that to DPO?

[–] joshred@alien.top 1 points 2 years ago

Just that they have a project known as q*.

[–] residentmouse@alien.top 1 points 2 years ago

Yeah, so largely I think you’ve hit the nail but just in case you don’t know the fervour is a deliberately leaked project name “Q*” and the suggestion it precipitated the OpenAI board drama. Now, is this probably a tactic to keep prices high so stock sells @ the 65B valuation OAI had prior to the drama? Sure.

But it’s still fun to speculate.

[–] Status-Effect9157@alien.top 1 points 2 years ago

This whole Q-star hullaballoo just reminds me of HBO Silicon Valley's "the bear is sticky with honey" episode