LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Starling-RM-7B-alpha: New RLAIF Finetuned 7b Model beats Openchat 3.5 and comes close to GPT-4 (alien.top)

submitted 2 years ago by Legcor@alien.top to c/localllama@poweruser.forum

49 comments fedilink hide all child comments

https://preview.redd.it/3krgd1sg2z2c1.png?width=800&format=png&auto=webp&s=b76c5fb9fa22938c74ec3095f63adaec8ff2219d

I came across this new finetuned model based on Openchat 3.5 which is apparently trained used Reinforcement Learning from AI Feedback (RLAIF).

https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha

Check out this tweet: https://twitter.com/bindureddy/status/1729253715549602071

you are viewing a single comment's thread
view the rest of the comments

[–] noeda@alien.top 1 points 2 years ago (3 children)

I've seen the "... beats GPT-4" enough times that now whenever I see a title that suggests a tiny model can compete with GPT-4 I see it as a negative signal; that the authors are bullshitting through some benchmarks or some other shenanigans.

It's annoying because the models might be legitimately good models for being open and within their weight class but now you've put my brain in BS detecting mode and I can't trust you've done good faith measurement anymore.

[–] Evening_Ad6637@alien.top 1 points 2 years ago (1 children)

Yeah I dont think authors are intentionally bullshitting or intentionally doing "benchmark cosmetics", but maybe it's more lack of knowledge on whats going on in terms of (most of) benchmarks and their the image that has become ruined in the meantime.

[–] Competitive_Ad_5515@alien.top 1 points 2 years ago

Sure, but name-dropping the biggest name in the game and comparing yourself favourably to it is a big swing. It's either a naive at best marketing claim or it's untrue.

[–] bot-333@alien.top 1 points 2 years ago

There are SO many models "bullshitting through some benchmarks or some other shenanigans" that I'm cooking my own benchmark system LOL.

[–] Kep0a@alien.top 1 points 2 years ago

Yeah I just roll my eyes and continue onwards