this post was submitted on 17 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

So I have recently gone down the rabbit hole of cancelling my ChatGPT subscription and now just use OpenHermes2.5-Mistral-7B. I've learned about the different benchmarks and how they compare and I understand how to read the HuggingFace LLM leaderboard and download any other model I might like to try.

What I struggle to understand is the meaning of the naming conventions. Mistral seems to clearly be better than LLAMA2 from what I have read and I understand the differences of 7B, 13B, etc etc.

Can someone explain the additional prefixes of Hermes, OpenHermes, NeuralChat, etc.

Tldr; What is the difference between Dolphin-Mistral and OpenHermes-Mistral. I'm guessing one is the dataset and the other is how it was trained?

top 1 comments
sorted by: hot top controversial new old
[–] __SlimeQ__@alien.top 1 points 10 months ago

Mistral and Llama2 (and Llama) are foundation models, meaning they actually trained all the weights given. Almost anything worth using is a derivative of these 3 foundation models. They are really expensive to train.

Just about everything else is a Lora fine tune on top of one of them. Fine tunes only change a small fraction of the weights, like 1%. Functionally speaking, the important part of these is the additional data they were trained on, and that training can be done on any underlying model.

So Open hermes is a Lora tuning on top of mistral, and is some opensource offshoot of nous hermes, which is an instruction dataset for giving good smart answers (or something) in a given instruction format.