Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[D] - What is the latest in tree-based approaches for LLMs? Has there been any significant research using RL for this? (alien.top)

submitted 2 years ago by 30299578815310@alien.top to c/machinelearning@academy.garden

3 comments fedilink hide all child comments

Curious on sota on this topic.

Im familiar with CoT and Tree of Thoughts, but those don't seem to train the model to excel at using the tree, they just rely on the pretrained model already being good at it. The model has no way to improve its tree-use over time.

Is anybody actively training models to be optimized for tree-based decoding?

you are viewing a single comment's thread
view the rest of the comments

[–] 30299578815310@alien.top 1 points 2 years ago

Yeah, I also notice there are two types of ways to implement trees being researched.

One is at a sequence / thought level, like tree of thoughts / chain of thoughts, where the model talks to itself in order to find the best solution. The other is at the decoding / token level, where the tree is used to search for the optimal next set of tokens. In principle you could put these both together and have nested trees.

But yeah I think the alpha-go style self-learning is what is really missing here. In principle, even without a tree, nothing stops us from putting an LLM in an environment where it gets positive feedback from rewards (like solving math problems), and then just let it rip.