30299578815310

joined 10 months ago
[–] 30299578815310@alien.top 1 points 9 months ago (2 children)

How does this work? Like I'm really confused at a conceptual level on how you merge models with different numbers of different sized layers.

[–] 30299578815310@alien.top 1 points 9 months ago

Yeah, I also notice there are two types of ways to implement trees being researched.

One is at a sequence / thought level, like tree of thoughts / chain of thoughts, where the model talks to itself in order to find the best solution. The other is at the decoding / token level, where the tree is used to search for the optimal next set of tokens. In principle you could put these both together and have nested trees.

But yeah I think the alpha-go style self-learning is what is really missing here. In principle, even without a tree, nothing stops us from putting an LLM in an environment where it gets positive feedback from rewards (like solving math problems), and then just let it rip.

 

Curious on sota on this topic.

Im familiar with CoT and Tree of Thoughts, but those don't seem to train the model to excel at using the tree, they just rely on the pretrained model already being good at it. The model has no way to improve its tree-use over time.

Is anybody actively training models to be optimized for tree-based decoding?

 

A common situation in IRL problems with long time horizons is the need to perform multiple very different subtasks. For example, imagine a model trained to remember a poem and then spell it out in blocks in a game of minecraft. The data for the poem itself and the appropriate minecraft functions probably have very different embeddings, but in practice it would probably be useful to ensure the memories for how to use minecraft functions are queried when that poem is queried.

It seems like just querying a RAG DB for the vectors with the highest cosine similarity won't be super useful for this task. A query for poems will just find poem-like data. But we don't just want to find things with similar embeddings to poems, we want to find data that is useful. Has there been any research into this time-series / associative type of RAG?

[–] 30299578815310@alien.top 1 points 10 months ago (1 children)

Does this even apply to anybody then, or is this only relevant going forward? Did even GPT4 need that type of power?