LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Question about LLM naming (alien.top)

submitted 2 years ago by ijustdontcare2try@alien.top to c/localllama@poweruser.forum

1 comments fedilink hide all child comments

So I have recently gone down the rabbit hole of cancelling my ChatGPT subscription and now just use OpenHermes2.5-Mistral-7B. I've learned about the different benchmarks and how they compare and I understand how to read the HuggingFace LLM leaderboard and download any other model I might like to try.

What I struggle to understand is the meaning of the naming conventions. Mistral seems to clearly be better than LLAMA2 from what I have read and I understand the differences of 7B, 13B, etc etc.

Can someone explain the additional prefixes of Hermes, OpenHermes, NeuralChat, etc.

Tldr; What is the difference between Dolphin-Mistral and OpenHermes-Mistral. I'm guessing one is the dataset and the other is how it was trained?

top 1 comments

sorted by: hot top controversial new old

[–] __SlimeQ__@alien.top 1 points 2 years ago

Mistral and Llama2 (and Llama) are foundation models, meaning they actually trained all the weights given. Almost anything worth using is a derivative of these 3 foundation models. They are really expensive to train.

Just about everything else is a Lora fine tune on top of one of them. Fine tunes only change a small fraction of the weights, like 1%. Functionally speaking, the important part of these is the additional data they were trained on, and that training can be done on any underlying model.

So Open hermes is a Lora tuning on top of mistral, and is some opensource offshoot of nous hermes, which is an instruction dataset for giving good smart answers (or something) in a given instruction format.