this post was submitted on 24 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

I've been hearing Q* = Q-learning + A* (search algorithm).

Trying to make some sense of it, so let me know what I missed or got wrong

here's what I know: It's supposed to improving language model decoding.

  1. Q-learning is a form of model-free reinforcement learning where an agent learns to maximize a cumulative reward. When applied to language models, the actions could be the selection of tokens, with the reward being the effectiveness of the generated response.

  2. A* is an informed search algorithm, or a best-first search, which uses heuristics to estimate the best path to the goal. In language generation, the goal could be the most coherent and contextually relevant completion (chat response).

  • Beam Search in Decoding: This method is used in LLMs, looks at a set of possible next sequences instead of just the single most likely next token.

In a hypothetical Q* approach:

  • Informed Token Selection: It could use heuristics, based on context and language understanding, to guide the selection of token sequences.

  • Maximizing Future Reward: Like Q-learning, it would aim to maximize a future reward, potentially based on coherence, relevance, or user engagement with the generated text.

  • Beyond Simple Probability Multiplication: Rather than merely multiplying probabilities of token sequences, it could evaluate sequences based on a combined heuristic and reward-based framework.

In theory this could lead to more effective, contextually relevant text generation, especially in scenarios that require a balance between creativity and specific guidelines or objectives.

you are viewing a single comment's thread
view the rest of the comments
[–] Intelligent_Rough_21@alien.top 1 points 11 months ago

Excited to see them almost certainly combine their RL expertise with their LLM expertise to encourage reasoning. It's been the most obvious thing since the invention of LLMs, and I'm sure they will figure it out or deepmind will. We all know its coming. Excited for the near future.