I'm curious what results you're seeing from the Yi models. I've been playing around with LoneStriker_Nous-Capybara-34B-5.0bpw-h6-exl2 and more recently LoneStriker_Capybara-Tess-Yi-34B-200K-DARE-Ties-5.0bpw-h6-exl2 and I'm finding them fairly good with the right settings. I found the Yi 34B models almost unusable due to repetition issues until I tried settings recommended in this discussion:
https://www.reddit.com/r/LocalLLaMA/comments/182iuj4/yi34b_models_repetition_issues/
I've found it much better since.
I tried out one of the neural models and found it couldn't keep track of details at all. I wonder if my setting weren't very good or something. I would have been using a EXL2 or GPTQ version though.
One question I have in regards to this stuff is if we improve the way we randomize the next token, does that increase the likelihood of the "thesaurus" problem occuring? I.e. where the model just keeps using more like "flowery" words because it doesn't want to keep reusing the same ones. I find that becomes a problem with a long enough context in a chat when using some of the other settings designed around avoiding repetition. Like sometimes my characters will start out talking normally and slowly progress into talking like college professors giving poetry lectures.