ijustdontcare2try

joined 1 year ago
[–] ijustdontcare2try@alien.top 1 points 11 months ago

I am not really an expert myself but I will give it my shot.

The problem:

If you load a downloaded LLM and just try to feed it a book and some instructions it will breakdown within seconds. LLMs have limits on their context window that can barely handle a roleplay after 10 minutes of chatting. If my roleplay character can't even remember their best friend's name after 10 minutes of chatting then it will not be able to process a whole book.

The Solution:

What you need to do is actually include the book in the training dataset. Instead of feed a LLM the book via a prompt you would need it to already know the book from it's original training. The strong AI nerds here can probably do this but most people here are downloading models from the HuggingFace and testing/trying them out. Training a model with your own dataset could be fun but it will require you do some research/self teaching on how to do it and then you will still need the GPU processing power to build it.

 

So I have recently gone down the rabbit hole of cancelling my ChatGPT subscription and now just use OpenHermes2.5-Mistral-7B. I've learned about the different benchmarks and how they compare and I understand how to read the HuggingFace LLM leaderboard and download any other model I might like to try.

What I struggle to understand is the meaning of the naming conventions. Mistral seems to clearly be better than LLAMA2 from what I have read and I understand the differences of 7B, 13B, etc etc.

Can someone explain the additional prefixes of Hermes, OpenHermes, NeuralChat, etc.

Tldr; What is the difference between Dolphin-Mistral and OpenHermes-Mistral. I'm guessing one is the dataset and the other is how it was trained?