overview for tmacnadidas

1

Understand Q* (alien.top)

submitted 11 months ago by tmacnadidas@alien.top to c/localllama@poweruser.forum

3 comments fedilink

I've been hearing Q* = Q-learning + A* (search algorithm).

Trying to make some sense of it, so let me know what I missed or got wrong

here's what I know: It's supposed to improving language model decoding.

Q-learning is a form of model-free reinforcement learning where an agent learns to maximize a cumulative reward. When applied to language models, the actions could be the selection of tokens, with the reward being the effectiveness of the generated response.
A* is an informed search algorithm, or a best-first search, which uses heuristics to estimate the best path to the goal. In language generation, the goal could be the most coherent and contextually relevant completion (chat response).

Beam Search in Decoding: This method is used in LLMs, looks at a set of possible next sequences instead of just the single most likely next token.

In a hypothetical Q* approach:

Informed Token Selection: It could use heuristics, based on context and language understanding, to guide the selection of token sequences.
Maximizing Future Reward: Like Q-learning, it would aim to maximize a future reward, potentially based on coherence, relevance, or user engagement with the generated text.
Beyond Simple Probability Multiplication: Rather than merely multiplying probabilities of token sequences, it could evaluate sequences based on a combined heuristic and reward-based framework.

In theory this could lead to more effective, contextually relevant text generation, especially in scenarios that require a balance between creativity and specific guidelines or objectives.

Gradio or streamlit for prototyping and why? in c/localllama@poweruser.forum

[–] tmacnadidas@alien.top 1 points 11 months ago

I'm gonna clean it up and open source it soon! Will post here when I do

Gradio or streamlit for prototyping and why? in c/localllama@poweruser.forum

[–] tmacnadidas@alien.top 1 points 11 months ago (2 children)

^ Both are about the same in terms of learning curve. Streamlit produces a better looking app IMO E.g.

https://preview.redd.it/ncd9tzlrlr1c1.png?width=1663&format=png&auto=webp&s=60d7f1283fb777fe89a72950ad5f576990504a17