LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

The Q* hypothesis: Tree-of-thoughts reasoning, process reward models, and supercharging synthetic data (alien.top)

submitted 11 months ago by Thistleknot@alien.top to c/localllama@poweruser.forum

8 comments fedilink hide all child comments

https://www.interconnects.ai/p/q-star

you are viewing a single comment's thread
view the rest of the comments

[–] Willing_Breadfruit@alien.top 1 points 11 months ago (1 children)

Yann Lecunn tweet what this is today. Token prediction with planning. Far below prompt level.

[–] Thistleknot@alien.top 1 points 11 months ago (1 children)

https://twitter.com/ylecun/status/1728126868342145481?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet

[–] New_Lifeguard4020@alien.top 1 points 11 months ago (1 children)

Please explain what he mean with his post:
One of the main challenges to improve LLM reliability is to replace Auto-Regressive token prediction with planning.

[–] Thistleknot@alien.top 1 points 11 months ago (1 children)

I had to read that a few times.

Auto-Regressive is like forecasting, it's iterative.

LLM reliability is this vague concept of trying to get to the right answer.

Hence tree of thoughts as a way to 'plan' to that vague concept of the right answer.

Circumvents the univariate next token prediction limitation with parallel planning.

[–] Willing_Breadfruit@alien.top 1 points 11 months ago (1 children)

ermm, idk what you mean by any of those words.

Auto-regressive just means it's a time series that depends on its previous predictions.

So, when you predict a token at time t -- you condition on the previous tokens you already predicted.

Consider, "the cat in the hat". A transformer that predicted it would have predicated it in the following manner (assuming that each of the words are a token bc I'm lazy):

-P("the"|prompt) is highest

-P("cat"|"the",prompt) is highest

-P("in"|"the","cat",prompt) is highest

So you can see there is a dependency between each of its predictions and the next prediction. This is what is meant by auto-regressive.

[–] Thistleknot@alien.top 1 points 11 months ago

Yes I understand all that

Auto regressive is like arima In time series forecasting

Then rnn came along

Then sequence to sequence

They all have the last prediction is used as input for the next prediction in common

Hence auto regressive