Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[D] - What is the latest in tree-based approaches for LLMs? Has there been any significant research using RL for this? (alien.top)

submitted 2 years ago by 30299578815310@alien.top to c/machinelearning@academy.garden

3 comments fedilink hide all child comments

Curious on sota on this topic.

Im familiar with CoT and Tree of Thoughts, but those don't seem to train the model to excel at using the tree, they just rely on the pretrained model already being good at it. The model has no way to improve its tree-use over time.

Is anybody actively training models to be optimized for tree-based decoding?

top 3 comments

sorted by: hot top controversial new old

[–] m98789@alien.top 1 points 2 years ago

Q* haven’t you heard?

[–] residentmouse@alien.top 1 points 2 years ago (1 children)

Great question, curious in the answer myself.

I think it’s pretty cool that just iteratively reusing an LLM without additional training, i.e chaining prompts, improves quality in most of these methods. I see quite a few of these papers (e.g System 2 Attention).

The Promptbreeder paper has some benchmarking of these methods & proposes an interesting evolutionary prompting strategy.

But like you I’ve been looking / waiting for the papers that explore specifically finetuning the model “nodes”, using LoRA perhaps, or with a meta network or hyper network.

[–] 30299578815310@alien.top 1 points 2 years ago

Yeah, I also notice there are two types of ways to implement trees being researched.

One is at a sequence / thought level, like tree of thoughts / chain of thoughts, where the model talks to itself in order to find the best solution. The other is at the decoding / token level, where the tree is used to search for the optimal next set of tokens. In principle you could put these both together and have nested trees.

But yeah I think the alpha-go style self-learning is what is really missing here. In principle, even without a tree, nothing stops us from putting an LLM in an environment where it gets positive feedback from rewards (like solving math problems), and then just let it rip.