LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

NeuralChat 7B: Intel’s Chat Model Trained with DPO (alien.top)

submitted 11 months ago by aminedjeghri@alien.top to c/localllama@poweruser.forum

14 comments fedilink hide all child comments

The new chat model released by Intel is now at the top of the OpenLLM leaderboard (among the 7B models).

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

you are viewing a single comment's thread
view the rest of the comments

[–] georgejrjrjr@alien.top 1 points 11 months ago (1 children)

The model seems cool and all, but the paper is better.

Intel eliminated the preference data from direct preference optimization. Preference data is expensive and collecting it is a hassle, so this is a big deal. Best of all, it looks like their no-preference DPO actually performs better.

The trick is sampling rejects from a small model. Let’s say you have a dataset of GPT-4 completions. You mark those as good (“preferred”). You prompt Llama 2 13B and mark its responses as rejects.

Tl;dr This could boost the performance of nearly every model with a minimal increase in complexity (though obviously it’s non-zero compute).

[–] cztomsik@alien.top 1 points 11 months ago

Thank you for the summary, that is actually very cool idea!