this post was submitted on 23 Nov 2023
1 points (100.0% liked)

LocalLLaMA

4 readers
4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago
MODERATORS
 

Reuters is reporting that OpenAI achieved an advance with a technique called Q* (pronounced Q-Star).

So what is Q*?

I asked around the AI researcher campfire and…

It’s probably Q Learning MCTS, a Monte Carlo tree search reinforcement learning algorithm.

Which is right in line with the strategy DeepMind (vaguely) said they’re taking with Gemini.

Another corroborating data-point: an early GPT-4 tester mentioned on a podcast that they are working on ways to trade inference compute for smarter output. MCTS is probably the most promising method in the literature for doing that.

So how do we do it? Well, the closest thing I know of presently available is Weave, within a concise / readable Apache licensed MCTS lRL fine-tuning package called minihf.

https://github.com/JD-P/minihf/blob/main/weave.py

I’ll update the post with more info when I have it about q-learning in particular, and what the deltas are from Weave.

top 46 comments
sorted by: hot top controversial new old
[–] sprectza@alien.top 1 points 2 years ago (1 children)

Yeah I think its MCTS reinforcement learning algorithm. I think DeepMind is the best lab when it comes to depeloping strategy and planning capable agents, given how good AlphaZero and AlphaGo is, and if they integrate it with the "Gemini" project, they really might just "ecliplse" GPT-4. I don't know how scalable it would be in terms of inference given the amount of compute required.

[–] lockdown_lard@alien.top 1 points 2 years ago

Have DeepMind released any leading-edge tools recently? MuZero was quite a few years ago now, and AlphaGo is ancient in AI terms.

DeepMind seem to have promised an awful lot, come up with a lot of clever announcements, but been very sparse on actual delivery of much at all.

[–] rarted_tarp@alien.top 1 points 2 years ago (4 children)

Has to be a mix of Q-learning and A* right?

[–] DoubleDisk9425@alien.top 1 points 2 years ago

Can you please ELI-idiot?

[–] RaiseRuntimeError@alien.top 1 points 2 years ago

I was going to say it seems like it was just yesterday I was learning A* and now I find out that they are already up to Q*

[–] letsburn00@alien.top 1 points 2 years ago (2 children)

I know you're joking, but it's hilarious how many random things in science just got given letters.

A* is the algorithm your phone uses to help you drive home....and the supermassive black hole in the centre of the galaxy.

[–] TheOtherKaiba@alien.top 1 points 2 years ago (1 children)
[–] Unfair-Emergency-658@alien.top 1 points 2 years ago

What is a star?

[–] KallistiTMP@alien.top 1 points 2 years ago

....and the supermassive black hole in the centre of the galaxy.

What did you think they were gonna use for that? Djikstra's?

[–] Local_Beach@alien.top 1 points 2 years ago

Mayve an A* search in vector space

[–] Mrleibniz@alien.top 1 points 2 years ago (2 children)

Let the co-founder of OpenAI John Schulman explain it to you

[–] chipstastegood@alien.top 1 points 2 years ago

This should be higher up.

[–] MannowLawn@alien.top 1 points 2 years ago

explain it to you

lol, might as wel spoken mandarin, thhis is so far away from my math skills. I have no clue what this guy is saying

[–] ninjasaid13@alien.top 1 points 2 years ago (1 children)

What's so special about Q*

[–] Oswald_Hydrabot@alien.top 1 points 2 years ago

A marketing piece by OpenAI to lie to people to hype product

[–] Interesting_Bison530@alien.top 1 points 2 years ago

I think there a few llms that incorporate mcts on github

[–] RogueStargun@alien.top 1 points 2 years ago (2 children)

Q* is just a reinforcement learning technique.

Perhaps they scaled it up and combined it with LLMs

Given their recently published paper, they probably figured out a way to get GPT to learn their own reward function somehow.

Perhaps some chicken little board members believe this would be the philosophical trigger towards machine intelligence deciding upon its own alignment.

[–] herozorro@alien.top 1 points 2 years ago

Given their recently published paper, they probably figured out a way to get GPT to learn their own reward function somehow.

you just need 2 GPTs talking with each other. the seconds acts as a critic and guides the first

[–] newsreddittoday@alien.top 1 points 2 years ago

Which paper are you referring to?

[–] balianone@alien.top 1 points 2 years ago (1 children)

We can infer that any such advance by OpenAI that follows the naming convention of "Q*" would likely be a significant development in the field of reinforcement learning, possibly expanding upon or enhancing traditional Q-Learning methodologies.

[–] tortistic_turtle@alien.top 1 points 2 years ago

Thanks, ChatGPT

[–] Obseslescence@alien.top 1 points 2 years ago

this is yet more bogus nonsense. i have a list of pretty simple questions life experience has ultimately taught me answers to that gpt simply cannot answer. if it's a breakthrough then they need to deploy it now to make gpt4 better because it fails all the time.

pathetic sheeple believe anything. spread false rumors to bolster company valuation. pathetic.

[–] cddelgado@alien.top 1 points 2 years ago

Part of me wants to think it relates to the science meaning, but I can't find how an almost-black hole with exotic matter filling. So it is either named Q* because it is the edge of the singularity, or it is so messed up that it eats other models for funsies.

[–] chipstastegood@alien.top 1 points 2 years ago

There is too much hype about AGI and Singularity. We’ll get smaller models that give better answers - but AGI this is not.

[–] 20rakah@alien.top 1 points 2 years ago

Wasn't there a big thing about tree search just a few months ago? haven't been keeping up too much.

[–] 345Y_Chubby@alien.top 1 points 2 years ago

If it teaches itself to learn it’s just a matter of time until it teaches itself to code

[–] HeinrichTheWolf_17@alien.top 1 points 2 years ago

I’m wondering if Q-Star is a recursive self improvement mechanism? Perhaps the in house model they have can innovate and consistently learn on top of what it’s been trained on?

[–] BlackSheepWI@alien.top 1 points 2 years ago

I heard they have an even bigger breakthrough up their sleeve... Rumor is that it's called GPT2, and it's too dangerous to even release to the public 👀

[–] Honest_Science@alien.top 1 points 2 years ago

Qtransformer.github.io

[–] olddoglearnsnewtrick@alien.top 1 points 2 years ago (1 children)

It's a silicon based version of Qanon. I will be terminated by telling you but wait 'till they launch MAGA (Machine Augmented General AI) !!!

[–] Kep0a@alien.top 1 points 2 years ago
[–] FunkyFr3d@alien.top 1 points 2 years ago (1 children)

Calling it Q was a terrible idea. The cookers are going to go crazier

[–] DefinitelyNotEmu@alien.top 1 points 2 years ago (1 children)
[–] FunkyFr3d@alien.top 1 points 2 years ago

I’m not a fan of tech companies in general but Amazon is definitely one of most disliked.

[–] _Lee_B_@alien.top 1 points 2 years ago
[–] Able_Conflict3308@alien.top 1 points 2 years ago
[–] Large-Dinner8395@alien.top 1 points 2 years ago

Join qtox it’s the same as jump and cos but it’s brand new 3 days old so get in early and start earning interest on however much you deposit and earn commission by inviting people

https://down.qtoxpro.com/?tid=CXGKS2

[–] wind_dude@alien.top 1 points 2 years ago

is there something other than the letter Q making you think it's Q-learning?

[–] Xnohat@alien.top 1 points 2 years ago (1 children)

Ilya from OpenAI have published a paper (2020) about Q* , a GPT-f model have capabilities in understand and resolve Mathhttps://arxiv.org/abs/2009.03393

[–] Scrattlebeard@alien.top 1 points 2 years ago

I don't see any mention of Q* in that paper, am I missing something?

[–] ajibawa-2023@alien.top 1 points 2 years ago

This video by David Shapiro explains very well about Q*: https://www.youtube.com/watch?v=T1RuUw019vA
I have good idea about RL but better to have in video format so that everyone can understand.

[–] CoffeePizzaSushiDick@alien.top 1 points 2 years ago

so just GTPQ ?

[–] malinefficient@alien.top 1 points 2 years ago

Insist on better, insist on R** or GTFO...

[–] honestduane@alien.top 1 points 2 years ago

Q* was completely explained, and openAI explained what it was. I was even able to make a YouTube video about it because they’re explanation was so clear, so I was able to explain it as if you were five years old.

I don’t understand how people believe this is a secretive thing and I don’t understand why people aren’t talking about how simple it is.

Everybody is talking about this like it’s some grand secret, why?

I mean, the algorithm is expensive to run, but it’s not that hard to understand.

Can somebody please explain why everybody’s acting like this is such a big secret thing?

[–] georgejrjrjr@alien.top 1 points 2 years ago

Edits aren't working for me somehow, here's my update:

First, as I mentioned on twitter but failed to address here, this is at least excellent PR. So that may be all it is, basically a more sophisticated "AGI achieved internally" troll. I would suggest taking Q* discourse with all due salt.

From context and the description, it looks like OpenAI published about the technique in question here: https://openai.com/research/improving-mathematical-reasoning-with-process-supervision

The result is pretty unsurprising: given process supervision (i.e., help from a suitably accurate model of a particular process), models perform better.

Well...yeah. It's probably an impactful direction for AI as people find ways to build good process models, but it isn't an especially novel finding, nor is it a reason to blow up a company. This updates me further in the direction of, "Q* discourse was a brilliant PR move to capitalize off of the controversy and direct attention away from the board power struggle."

Which doesn't mean it can't also be a good intuition pump for the open source world. Every big lab seems to be thinking about model-based supervision, it would be a little bit silly if we weren't. So coming back to the original question:

How might we use this?

I think the question reduces to, "What means of supervision are available?"

Once you have a supervisor to play "warmer / colder" with the model, the rest is trivial.

I'm curious what models you all expect to come online to supervise llms. Arithmetic has already been reported. Code, too.

What else?