LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

NeuralHermes-2.5: Boosting SFT models' performance with DPO (alien.top)

submitted 2 years ago by mlabonne@alien.top to c/localllama@poweruser.forum

13 comments fedilink hide all child comments

https://preview.redd.it/oa22kq7vod3c1.jpg?width=1792&format=pjpg&auto=webp&s=4ce176c4f1a4b988b2202a359c67505d759dfd9e

I just released the NeuralHermes-2.5-Mistral-7B model, which is a DPO fine-tuned version of OpenHermes-2.5-Mistral-7B. Teknium, the creator of the SFT model, confirmed on Twitter that this version improves benchmark scores in AGIEval, GPT4All, and TruthfulQA.

Take is a simple proof of concept: I used Intel's orca_dpo_pairs (from neural-chat-7b-v3-1) in a ChatML format, and only trained it for one hour on an A100 (using Goole Colab). But it shows the potential of DPO to boost the performance of SFT models, basically for free. I released all the code so that everyone can easily experiment with it and find better parameters (it shouldn't be difficult). You can also access the W&B project.

Note that the preference dataset is also entirely synthetic, with preferred answers coming from GPT-4/3.5 and rejected responses coming from Llama 2 13b chat. It's a very cheap and efficient way to convert an instruction dataset (OpenOrca in this case) into a preference dataset. I wasn't very successful in my previous experiments with DPO using other datasets, so I think there's something very interesting with this one. We can easily reproduce this dataset and improve it with other sources.

I just wanted to share these thoughts and experiments with the community. I'm writing an article about DPO and this model is actually a lucky by-product of it. I'll share it when it's ready.

If you want to try the model, TheBloke already provided GGUF and AWQ versions of it.

top 13 comments

sorted by: hot top controversial new old

[–] perlthoughts@alien.top 1 points 2 years ago

nice job!

[–] actualopenai@alien.top 1 points 2 years ago (2 children)

works really well to get it on the 16k version https://huggingface.co/NurtureAI/OpenHermes-2.5-Mistral-7B-16k
would it have to be a different dataset?

[–] mlabonne@alien.top 1 points 2 years ago

It's a good question, I can give it a try. Ideally, you'd want a 16k version of the preference dataset to make sure that DPO doesn't ruin it. But considering the low number of training samples, it probably works fine.

[–] Creative_Bottle_3225@alien.top 1 points 2 years ago

what is the difference between normal and 16 K?

[–] onil_gova@alien.top 1 points 2 years ago (1 children)

New favorite model!

[–] onil_gova@alien.top 1 points 2 years ago (1 children)

what does it feel like to generate tokens?

https://preview.redd.it/ypt1we1cmf3c1.png?width=681&format=png&auto=webp&s=c3a0fd98e41fbd2fffd725bd34124c2c7f887715

[–] petitmottin@alien.top 1 points 2 years ago

Wow

[–] kpodkanowicz@alien.top 1 points 2 years ago (1 children)

really cool! what do you think about using gpt3.5 as the worst output in the hopes to resurface some extra edge?

[–] mlabonne@alien.top 1 points 2 years ago (1 children)

Yes, I'd say it'd probably work better than the current approach. If you look at the reward plots on wandb, it feels like the problem is too easy for the model, hence slight improvement.

https://preview.redd.it/xhuyiquojg3c1.png?width=2398&format=png&auto=webp&s=67725747e6cd9254e38728149fb6cea3ba85d71e

[–] ganzzahl@alien.top 1 points 2 years ago

I find it odd that your chosen rewards went negative... Doesn't this imply that the chosen samples became less likely than they were under the base model? You still get model improvements, since the rejected rewards got even less likely, but it's still odd feeling. Any thoughts there?

[–] Wonderful_Ad_5134@alien.top 1 points 2 years ago

The improvement is so small it can be a margin of error

[–] Informal-Ad-534@alien.top 1 points 2 years ago

It holds up pretty decent! What Mirostat Tau value would you recommend with it?

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

Would be cool to see this in a 34b and 70b.