I really like OpenHermes-7B for chat/RP purposes because it just seems to me to say much more creative and entertaining things than Llama-based models I’ve tried. It also seems to have pretty good accuracy on accuracy and explanation quality for single prompts, sometimes even including coding (the most I ever do is simple R scripts). I run it through OobaBooga.
But on the flip side, it seems to have very poor context both of things I’ve said previously in the conversation and the perceived relationship of things in its surroundings based on the initial character prompt. And it basically never advances the story. Xwin-70B feels much less interesting in the way it speaks to me, but it can drive the story and mostly seems to understand what is going on.
What actual variables affect memory, as well as the LLMs desire to actually drive the story/conversation forward? Explain it to me like you would explain to a scientist in a non-machine learning field. Also, are there any Mistral based models out or on the horizon that do a better job in these areas where OpenHermes struggles?
The models don't have memory per se, they just process the entirety of the context (i.e. the conversation) with each generation. As this becomes larger and more complex, models with less parameters struggle.
You can try to add certain instructions into the system prompt, such as "advance the story" but ultimately, more parameters means better grasp of the conversation. I haven't come across any model below an 8 bit 13b model that could keep a story together, so that's the minimum I go for when I want to RP.
As for the 70b's writing being less interesting, I'd say that's independent of the model capabilities and more down to style. Again, giving it instructions on how to write as well as example messages can help but it does somewhat come down to what it was trained on.